Open mujc21 opened 1 month ago
To my understanding, the pipeline still uses latents encoded with the vae, the difference is that it is concatenated with a low-resolution image. It that case, the noisy latents should be composited in the same way as current SyncMVD, while the low-resolution should use noiseless latents encoded from the same color.
color_images *= ((0.5 * color_images) + 0.5)
Also please change *= in this line to = which was a typo.
Thank you very much for your excellent work. I am currently trying to apply the SyncMVD method using the
StableDiffusionUpscalePipeline
. However, the latent space in this pipeline is not obtained throughvae.encode
, so the method of combining with a background in SyncMVD should be modified. However, after reviewing the code forStableDiffusionUpscalePipeline
, I tried several methods to initialize thebackground_latents
, but none were successful.Is it possible to avoid compositing the latent space with the background? If it is necessary, how should the
background_latents
be generated in theStableDiffusionUpscalePipeline
?