Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
695 stars 153 forks source link

Questions about data preprocessing #24

Closed Kuray107 closed 3 years ago

Kuray107 commented 3 years ago

Hello, thanks for the lip2wav dataset you kindly provided. I noticed that there are several scenes in the dataset where there is no face on the screen, and wondered how you solved this problem. Did you filter these data when training the model? Or did you just ignore them and got a good result still?

prajwalkr commented 3 years ago

The preprocessing script automatically filters out non-face segments.

Kuray107 commented 3 years ago

I found that when the face detector can not found any face in a timestamp, it will skip to the next image indeed. But I don't understand how the preprocessing script align the audio and video files when there is some mismatch. Is the audio segment also be cut when its video segment skips some non-face frames?

prajwalkr commented 3 years ago

the face crops are saved with frame numbers in their names. while training, the data loader skips segments if certain faces are missing.

Kuray107 commented 3 years ago

Yes, I got it. Thanks for your kind reply.