fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
https://fudan-generative-vision.github.io/hallo/
MIT License
9.24k stars 1.27k forks source link

Misalignment between motion module and AnimateDiff #190

Open Nyquist0 opened 3 weeks ago

Nyquist0 commented 3 weeks ago

Dear Sir or Madam,

Great work, thanks for sharing. I would like to consult you about the misalignment I found between you motion module and AnimateDiff.

For AnimateDiff, the feature should be reshape to (bxhxw) x f x c as the following figure shows.

image

But in your code here, I found the feature is reshaped to (bxf) x (hxw) * c.

Is there anything I missed? Looking forward your reply. Thanks.

zypsjtu commented 1 week ago

https://github.com/fudan-generative-vision/hallo/blob/8fd7c572a3d43c2a9c1a5473219ce4fc1b6e3ed2/hallo/models/motion_module.py#L581C1-L583C14