exx8 / differential-diffusion

398 stars 23 forks source link

Question about mask latents #30

Closed berkanz closed 3 months ago

berkanz commented 3 months ago

The proposed technique works well but I have a confusion in the pipeline. In the SD2's diff_pipe.py, it seems like map is downsampled by the vae_scale_factor:
map = torchvision.transforms.Resize(tuple(s // self.vae_scale_factor for s in image.shape[2:]),antialias=None)(map) and then directly multiplied with image latents: masks = map > thresholds latents = original_with_noise[i] * mask + latents * (1 - mask)

why isn't mask latents being computed? For example, inpainting pipeline of diffusers has the section where they calculate latents of mask and masked_image. Isn't such step necessary?

berkanz commented 3 months ago

closing because I understood that the original inpainting pipeline also doesn't separately calculate the latent of the mask, instead multiplying latent of the masked image with the downsampled version of the mask itself.