How to generate background_latents?

Thank you very much for your excellent work. I am currently trying to apply the SyncMVD method using the StableDiffusionUpscalePipeline. However, the latent space in this pipeline is not obtained through vae.encode, so the method of combining with a background in SyncMVD should be modified. However, after reviewing the code for StableDiffusionUpscalePipeline, I tried several methods to initialize the background_latents, but none were successful.

color_images = torch.ones(
    (1, 1, latent_size * 8, latent_size * 8),
    device=self._execution_device,
    dtype=self.text_encoder.dtype
) * color_images
color_images *= ((0.5 * color_images) + 0.5)
color_latents = encode_latents(self.vae, color_images)

background_latents = [self.color_latents[color] for color in background_colors]
composited_tensor = composite_rendered_view(self.scheduler, background_latents, latents, masks, t)
latents = composited_tensor.type(latents.dtype)

Is it possible to avoid compositing the latent space with the background? If it is necessary, how should the background_latents be generated in the StableDiffusionUpscalePipeline?

LIU-Yuxin / SyncMVD

How to generate background_latents? #19