Classifier-free guidance

ashawkey / stable-dreamfusion

Text-to-3D & Image-to-3D & Mesh Exportation with NeRF + Diffusion.

Apache License 2.0

8.05k stars 713 forks source link

Classifier-free guidance #32

Closed ThibaultGROUEIX closed 1 year ago

ThibaultGROUEIX commented 1 year ago

Hi,

Just going over the code line-by-line and checking things ;) this line

I think

noise_pred = noise_pred_uncond + guidance_scale * (noise_pred_text - noise_pred_uncond)

should be

noise_pred = noise_pred_text + guidance_scale * (noise_pred_text - noise_pred_uncond)

to match the equation of the paper

Happy to do a PR if you confirm it needs to be change.

Thanks,

Screenshot 2022-10-10 at 18 52 22

ashawkey commented 1 year ago

@ThibaultGROUEIX Hi, thanks for the checking! I simply followed diffusers' tutorial here. You can find it in the last part.

ThibaultGROUEIX commented 1 year ago

indeed ! Seems like either the paper or the tuto has a typo. I a not sure which. It may not matter in this case since the guidance parameter is so high.

roibaron commented 1 year ago

@ThibaultGROUEIX Did you make this change work?

chenguolin commented 1 year ago

There are two forms to express classifier-free guidance:

epsilon = epsilon_uncond + s * (epsilon - epsilon_uncond), which is used in GLIDE (Nichol et al., 2022), LDM and Stable Diffusion (Rombach et al., 2022)
epsilon = (1+w) epsilon + w epsilon_uncond, which is used in the classifier-free guidance original paper (Ho and Salimans, 2021) and DreamFusion (Poole et al., 2022)

Both of them are correct. But for the first case, you should set s>1 to enable classifier-free guidance, and set w>0 instead in the second case.