SUDO-AI-3D / zero123plus

Code repository for Zero123++: a Single Image to Consistent Multi-view Diffusion Base Model.
Apache License 2.0
1.56k stars 108 forks source link

Some questions regarding the input and output latent scaling #60

Closed cwchenwang closed 5 months ago

cwchenwang commented 5 months ago

I noticed in the code that when performing generation from gaussian noises, we need to first unscale_latents -> divide vae.config.scaling_factor -> vae decode -> unscale images to get the final image. However, when I tried to directly denoise an input image, how should I apply the scale operations during the encoding and decoding process? I tried the following code:

renderings = scale_images(images)
particles = vae.encode(renderings).latent_dist.sample() * vae.config.scaling_factor
particles = scale_latents(particles)
t = torch.tensor([1], device=device).long()
n = torch.randn_like(particles)
y_noisy = scheduler.add_noise(particles, n, t)
n_pred = predict_noise0_diffuser(Diff_pre, y_noisy, text_embeddings, t, guidance_scale=args.cfg, cross_attention_kwargs=cross_attention_kwargs, scheduler=scheduler)

predict_x = (y_noisy - (1-scheduler.alphas_cumprod.to(device)[t].reshape(-1, 1, 1, 1))**0.5 * n_pred) / scheduler.alphas_cumprod.to(device)[t].reshape(-1, 1, 1, 1)**0.5
predict_x = unscale_latents(predict_x)
image = pipe.vae.decode(predict_x / vae.config.scaling_factor, return_dict=False)[0]
image = unscale_images(image)
result = pipe.image_processor.postprocess(image, output_type='pil')

However, the denoised output (left) has different color than the input image (right): image

If I deleted all the scale, unscale functions, the results seem to be correct. So I am confused how to use these scale, unscale functions?

cwchenwang commented 5 months ago

I think I have figured out how to do: scale image -> scale latents -> / vae.scaling_factor -> * vae.scaling_factor -> unscale latents -> unscale images.

mochou-wujiu commented 1 month ago

can you explain why should in this order? because I met the same confuse, I think the right order is image -> (self.image_processor.preprocess) -> scale_image -> (self.vae.encode) -> *self.vae.config.scaling_factor -> scale_latent -> latent but the color is not same..

cwchenwang commented 1 month ago

The scale and unscale operations should be symmetric.