LPengYang / MotionClone

Official implementation of MotionClone: Training-Free Motion Cloning for Controllable Video Generation
402 stars 31 forks source link

Paper Error #12

Closed wenhuchen closed 1 month ago

wenhuchen commented 4 months ago
Screenshot 2024-07-14 at 11 54 45 PM

I'm pretty sure your formula is wrong here. This should be the derivative of $\frac{\partial g_m}{\partial z_t}$ instead of g_m itself. Your code implementation also shows that is the correct one.

Bujiazi commented 4 months ago

Thank you for your attention to MotionClone 🌹. We sincerely apologize for the ambiguity and confusion caused 😂. Your point is correct, and we will correct the formula in our paper soon.

LeonXu127 commented 2 months ago

image Could you please explain why the additional energy function term makes sense here? I did not find clues of this term in the original CFG paper. I would be appreciated if you can provide any document to learn or explain about this energy function term.

luo0207 commented 2 months ago

@Bujiazi I also have the same question with @LeonXu127 . Could you please explain this energy function term?

luo0207 commented 2 months ago

@LeonXu127 I understand it. As said by @wenhuchen , the energy term is wrong in the paper but correct in the code, actually MotionClone follows the paper Diffusion Models Beat GANs on Image Synthesis, which uses labels to guide the diffusion procedures, e.g., MotionClone uses the semantic and motion information from example video to guide the diffusion process. The corresponding code is code

LeonXu127 commented 2 months ago

@LeonXu127 I understand it. As said by @wenhuchen , the energy term is wrong in the paper but correct in the code, actually MotionClone follows the paper Diffusion Models Beat GANs on Image Synthesis, which uses labels to guide the diffusion procedures, e.g., MotionClone uses the semantic and motion information from example video to guide the diffusion process. The corresponding code is code

Thanks a lot! I will look through the classifier guidance paper and code.

LPengYang commented 1 month ago

@LeonXu127 I understand it. As said by @wenhuchen , the energy term is wrong in the paper but correct in the code, actually MotionClone follows the paper Diffusion Models Beat GANs on Image Synthesis, which uses labels to guide the diffusion procedures, e.g., MotionClone uses the semantic and motion information from example video to guide the diffusion process. The corresponding code is code

Thanks a lot! I will look through the classifier guidance paper and code.

Sorry for being late. We have released the latest paper and corresponding code. Now MotionClone is able to perform motion customization without cumbersome inversion processes and significantly reduce memory consumption (~14GB for 16×512×512 text-to-video generation). Hope this helps.

tyrink commented 1 month ago

@LeonXu127 I understand it. As said by @wenhuchen , the energy term is wrong in the paper but correct in the code, actually MotionClone follows the paper Diffusion Models Beat GANs on Image Synthesis, which uses labels to guide the diffusion procedures, e.g., MotionClone uses the semantic and motion information from example video to guide the diffusion process. The corresponding code is code

Thanks a lot! I will look through the classifier guidance paper and code.

Sorry for being late. We have released the latest paper and corresponding code. Now MotionClone is able to perform motion customization without cumbersome inversion processes and significantly reduce memory consumption (~14GB for 16×512×512 text-to-video generation). Hope this helps.

hi, could you explain why the ddim inversion process can be replaced by the addition of random noise to provide temporal guidance?

LPengYang commented 1 month ago

Hello~, Thanks for your attention on our work. In our experiments, it is observed that the motion guidance provided by DDIM inversion and direct noise-adding exhibit similar guidance strength. We speculate that this is because DDIM inversion does not yield perfect noisy latents (the inversion trajectory of DDIM inversion does not match the real sampling trajectory, as revealed in ``Null-text Inversion"), thereby reducing the quality gap between it and direct noise-adding. Additionally, the direct addition of noise for distribution shifting aligns with the operations performed during the training phase.