MyNiuuu / MOFA-Video

[ECCV 2024] MOFA-Video: Controllable Image Animation via Generative Motion Field Adaptions in Frozen Image-to-Video Diffusion Model.
https://myniuuu.github.io/MOFA_Video
Other
526 stars 27 forks source link

landmarks condition is stronger than flow condition #39

Open YunjieYu opened 3 weeks ago

YunjieYu commented 3 weeks ago

Hi, I find that your FlowControlNet injects landmarks except flow. Through my experiments, I find that landmarks condition is actually stronger than flow condition. Below is my result:

https://github.com/MyNiuuu/MOFA-Video/assets/44226851/d5dc13d1-af30-4a4d-adbf-7e60655d9552

This is the result only using flow condition.

https://github.com/MyNiuuu/MOFA-Video/assets/44226851/283a5be0-13cc-456b-8981-26a763cefdba

This is the result only using landmark condition.

https://github.com/MyNiuuu/MOFA-Video/assets/44226851/a1e7079e-e999-4c57-9b98-042d7c7ffa71

This is the result using landmark and flow condition simultaneously.

Can you explain this result? How can we get good results using only flow?

MyNiuuu commented 3 weeks ago

Hi! Thanks for your interest!

To make the model work with only motion flow inputs, you'll need to retrain it. The current model relies on both motion flow and landmarks, so it won't function correctly if either input is missing.

We have trained two versions of our landmark-based model: one that uses landmarks as conditional inputs and one that doesn't. We found that using only motion flow can already achieve control, but incorporating landmarks further stabilizes and enhances controllability, and this does not violate with the hybrid control mechanism of our entire model.

Also, we will release the training codes soon after we finished submitting the camera-ready of ECCV.

MyNiuuu commented 2 weeks ago

We have uploaded the training code.