fudan-generative-vision / hallo

Hallo: Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation
https://fudan-generative-vision.github.io/hallo/
MIT License
8.97k stars 1.21k forks source link

Frame Count Mismatch Issue During VFHQ Data Preparation #149

Open minmini2 opened 1 month ago

minmini2 commented 1 month ago

Hello,

Thank you for sharing such an excellent project!

I am currently preparing the VFHQ dataset to train a model. However, I encountered an issue during the process. When running the script "python scripts/extract_meta_info_stage2.py -r path/to/dataset -n dataset_name", I received numerous "Frame count mismatch for video:" messages. As a result, approximately half of the clips were deleted. Specifically, the number of clips decreased from 13,993 to 7,568.

Is this a normal occurrence? Could there be an issue at some specific point in the process? I would appreciate any tips or guidance you could provide.

Thank you!

xumingw commented 1 month ago

Does VFHQ has audio? I just remember that it has only frames.

minmini2 commented 1 month ago

VFHQ dataset has audio, because I downloaded the videos directly using the URLs provided in the 'meta_info'. Could the difference between the audio embedding and frame number be occur for many of the clips? Is this an abnormal situation?

xumingw commented 1 month ago

It's abnormal, please check fps, it should be 25fps

Nyquist0 commented 3 weeks ago

hi minmini, It should because some of videos in dataset are not 25 FPS. Besides, may I ask is VFHQ dataset is clean enough to generate video without artifacts?

minmini2 commented 3 weeks ago

hi @Nyquist0 Thank you! The issue was resolved when I converted the video to 25 fps as you suggested. As you were concerned, the data itself isn't very clean, so I'm trying to refine it. Do you have any recommendations for good, clean datasets for training, other than VFHQ? If you're researching this area, it would be great to exchange some information!