CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
67.59k stars 10.09k forks source link

stochastic_encode #465

Open cloor opened 1 year ago

cloor commented 1 year ago

In ddim.py

def stochastic_encode(self, x0, t, use_original_steps=False, noise=None):
        # fast, but does not allow for exact reconstruction
        # t serves as an index to gather the correct alphas
        if use_original_steps:
            sqrt_alphas_cumprod = self.sqrt_alphas_cumprod
            sqrt_one_minus_alphas_cumprod = self.sqrt_one_minus_alphas_cumprod
        else:
            sqrt_alphas_cumprod = torch.sqrt(self.ddim_alphas)
            sqrt_one_minus_alphas_cumprod = self.ddim_sqrt_one_minus_alphas

        if noise is None:
            noise = torch.randn_like(x0)
        return (extract_into_tensor(sqrt_alphas_cumprod, t, x0.shape) * x0 +
                    extract_into_tensor(sqrt_one_minus_alphas_cumprod, t, x0.shape) * noise)

def stochastic_encode() code says ' fast, but does not allow for exact reconstruction' is there encoding code for exact reconstruction? what is for exact reconstruction?

HydrogenC commented 5 months ago

I guess that the exact reconstruction means adding different randomly-generated noise step-by-step, which is time-consuming. This fast approach was proven in the DDPM paper and is used instead in either train or inference.