LIU-Yuxin / SyncMVD

Official PyTorch & Diffusers implementation of "Text-Guided Texturing by Synchronized Multi-View Diffusion"
MIT License
127 stars 8 forks source link

How to generate background_latents? #19

Open mujc21 opened 1 month ago

mujc21 commented 1 month ago

Thank you very much for your excellent work. I am currently trying to apply the SyncMVD method using the StableDiffusionUpscalePipeline. However, the latent space in this pipeline is not obtained through vae.encode, so the method of combining with a background in SyncMVD should be modified. However, after reviewing the code for StableDiffusionUpscalePipeline, I tried several methods to initialize the background_latents, but none were successful.

color_images = torch.ones(
    (1, 1, latent_size * 8, latent_size * 8),
    device=self._execution_device,
    dtype=self.text_encoder.dtype
) * color_images
color_images *= ((0.5 * color_images) + 0.5)
color_latents = encode_latents(self.vae, color_images)
background_latents = [self.color_latents[color] for color in background_colors]
composited_tensor = composite_rendered_view(self.scheduler, background_latents, latents, masks, t)
latents = composited_tensor.type(latents.dtype)

Is it possible to avoid compositing the latent space with the background? If it is necessary, how should the background_latents be generated in the StableDiffusionUpscalePipeline?

LIU-Yuxin commented 1 month ago

To my understanding, the pipeline still uses latents encoded with the vae, the difference is that it is concatenated with a low-resolution image. It that case, the noisy latents should be composited in the same way as current SyncMVD, while the low-resolution should use noiseless latents encoded from the same color. color_images *= ((0.5 * color_images) + 0.5) Also please change *= in this line to = which was a typo.