Yujun-Shi / DragDiffusion

[CVPR2024, Highlight] Official code for DragDiffusion
https://yujun-shi.github.io/projects/dragdiffusion.html
Apache License 2.0
1.13k stars 82 forks source link

Inconsistency between timestep and noise level in DragPipeline.inv_step() #31

Closed ZJUSaltFish closed 11 months ago

ZJUSaltFish commented 1 year ago

Thanks for your excellent work!

While digging into the code, I found something confusing. This issue is the same as https://github.com/google/prompt-to-prompt/issues/64, about the implementation of DDIM Inversion:

Firstly, according to the formula in ddim paper, the DDIM Inversion writes like this: immg

but in DragPipeline.inv_step(), there are: next_step = timestep timestep = min(timestep - self.scheduler.config.num_train_timesteps // self.scheduler.num_inference_steps, 999) , which renames t as t_next, and t_prev as t.

Therefore, I think the code actually gives:

$$z_{t+1}=\sqrt{\frac{\alphat}{\alpha{t-1}}}z_t+\sqrt{\alpha_t}\cdot\Bigg(\sqrt{\frac{1}{\alphat}-1} - \sqrt{\frac{1}{\alpha{t-1}} - 1}\Bigg)\cdot\epsilon_\theta(z_t,t)$$

This is really confusing to me, please help me out!

Yujun-Shi commented 11 months ago

Thanks for your interests in our work! I think this is due to the approximation when numerically solving the ODE when applying DDIM forward and inversion process:

image

(image taken from: https://openaccess.thecvf.com/content/CVPR2023/papers/Wallace_EDICT_Exact_Diffusion_Inversion_via_Coupled_Transformations_CVPR_2023_paper.pdf)

ZJUSaltFish commented 11 months ago

Thanks for your reply! I think it does make sense (although a bit counterintuitive)