fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
https://fudan-generative-vision.github.io/hallo/
MIT License
9.25k stars 1.27k forks source link

Inquiry about frame dim, 16 for inference and 14 for training? #192

Open Nyquist0 opened 3 weeks ago

Nyquist0 commented 3 weeks ago

Hi,

I found the frame dimension you are using is confusing.

For inference.py, you use 16. For training period, you use 14.

May I ask which one is better? I assume larger means better?