Yujun-Shi / DragDiffusion

[CVPR2024, Highlight] Official code for DragDiffusion
https://yujun-shi.github.io/projects/dragdiffusion.html
Apache License 2.0
1.16k stars 88 forks source link

Question about Motion Supervision and Point Tracking #70

Open tnarek opened 2 months ago

tnarek commented 2 months ago

Hello,

Thanks for sharing your great work!

I have 2 questions regarding motion supervision:

  1. Why do you normalize the direction vector $d_i$ in the loss (equation 3)? Can't we directly optimize for matching the position $g_i$ itself? Is it for making the latent optimization more gradual? If so, how important is it?
  2. Maybe related to the previous question -- why is the point tracking step necessary? I see that in equation 3 you are taking the target feature $sg(F_q(\hat{z}^k_t))$ from the optimized latent, which of course requires $q$ to be updated in the next steps by point tracking. But why can't the target feature be taken from the original latent, without having to update $q$?
ivanpuhachov commented 1 month ago

I was also wondering about the same. It seems like eq 3 has handle positions in pixel coordinates (not in natural image coordinates (0,1)^2 as I expected) so each update tries to move the point slightly. Also note that if handle position is already close to the target we skip corresponding loss term - see here https://github.com/Yujun-Shi/DragDiffusion/blob/ebe659a9c5b722f25d9690e74d813fca96531f97/utils/drag_utils.py#L133

As for 2 my guess is that we want to update features gradually. Some points in $\Omega$-region may change completely (think about the statue example - if the handle point is on the nose, under extreme rotations we want to have background pixels)

I would love to hear some confirmation from the authors