fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
https://fudan-generative-vision.github.io/hallo/
MIT License
9.49k stars 1.3k forks source link

about the data cleaning process #174

Open thbupt opened 3 months ago

thbupt commented 3 months ago

In th paper, "To ensure high-quality training data, we underwent a data cleaning process that focused on retaining single-person speaking videos exhibiting strong lip and audio consistency". So how to select videos with strong lip and audio consistency, can you share some ideas?