SUDO-AI-3D / zero123plus

Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Apache License 2.0
1.56k stars 108 forks source link

latent scaling #86

Open guochengqian opened 2 weeks ago

guochengqian commented 2 weeks ago

Dear authors, thanks for releasing zero123++. May I know why do you perform latent and image unscaling? And how do you decide the scaling ratio? https://huggingface.co/sudo-ai/zero123plus-pipeline/blob/main/pipeline.py#L396

latents = unscale_latents(latents)
image = unscale_image(...
def unscale_latents(latents):
    latents = latents / 0.75 + 0.22
    return latents

def unscale_image(image):
    image = image / 0.5 * 0.8
    return image

Thank you very much!

eliphatfs commented 2 weeks ago

We collected a set of natural images and data renderings and compare their latents to normalize the renderings so that the latents look more like natural images, which SD2 is trained on, mostly. For image, we empirically found that this scaling will let the model converge faster. This also helps reduce the contrast of rendering latents to the normal level of natural images. This way the model can learn better 'global timesteps' (the same reason why we swap the noise schedule and choose a v-prediction base model).