About full body generation and side-view face generation

CapHuman can generate full-body images with the help of OpenPose ControlNet, but the 512x512 resolution limitation may result in unsatisfactory details where the face occupies a smaller proportion. To address this, it is recommended to train on higher aspect ratios (e.g., 512x768 or 768x1024) or use a two-stage process involving low-resolution generation followed by high-resolution refinement.
For side-view faces, CapHuman's current bias toward frontal views is likely due to the CelebA dataset's distribution and the coupling of ID extraction with generation targets during training. To generate side faces, one can expand the dataset with multi-angle images, decouple ID extraction from generation. Alternatively, fine-tuning CapHuman with enhanced data could be explored.

VamosC / CapHuman