CompVis / stable-diffusion

A latent text-to-image diffusion model
https://ommer-lab.com/research/latent-diffusion-models/
Other
66.54k stars 9.97k forks source link

about the classifier-free guidance sampling code ? #796

Open JaosonMa opened 9 months ago

JaosonMa commented 9 months ago

in the paper , the latex is image image

so i think the code

def get_model_output(x, t):
            if unconditional_conditioning is None or unconditional_guidance_scale == 1.:
                e_t = self.model.apply_model(x, t, c)
            else:
                x_in = torch.cat([x] * 2)
                t_in = torch.cat([t] * 2)
                c_in = torch.cat([unconditional_conditioning, c])
                e_t_uncond, e_t = self.model.apply_model(x_in, t_in, c_in).chunk(2)
                e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)
            if score_corrector is not None:
                assert self.model.parameterization == "eps"
                e_t = score_corrector.modify_score(self.model, e_t, x, t, c, **corrector_kwargs)

            return e_t

from plms.py line 179 e_t = e_t_uncond + unconditional_guidance_scale * (e_t - e_t_uncond)

from my point , it should be e_t = e_t + unconditional_guidance_scale * (e_t - e_t_uncond)

can you tell me why? @apolinario @asanakoy @pesser @patrickvonplaten @rromb

MoayedHajiAli commented 9 months ago

GLIDE has the same formulation as LDM. I wonder why

MoayedHajiAli commented 9 months ago

If you figured out the reason, I would appreciate it if you can share it with me.

JaosonMa commented 9 months ago

I also hope I can find out why, and I will continue to search for answers, let's work together!

MoayedHajiAli commented 9 months ago

@JaosonMa Any updates on this!? I still couldn't figure out the reason behind the modification.

nshidqi commented 6 months ago

Have you found out the reason? I'm also wondering why

JaosonMa commented 6 months ago

sorry,i have not config this out

ChrisWang13 commented 1 month ago

You should see the DDPM paper e_t = e_t + unconditional_guidance_scale * (e_t - e_t_uncond) Makes the e_t one size bigger, which is not the case.