Closed Kuray107 closed 3 years ago
The preprocessing script automatically filters out non-face segments.
I found that when the face detector can not found any face in a timestamp, it will skip to the next image indeed. But I don't understand how the preprocessing script align the audio and video files when there is some mismatch. Is the audio segment also be cut when its video segment skips some non-face frames?
the face crops are saved with frame numbers in their names. while training, the data loader skips segments if certain faces are missing.
Yes, I got it. Thanks for your kind reply.
Hello, thanks for the lip2wav dataset you kindly provided. I noticed that there are several scenes in the dataset where there is no face on the screen, and wondered how you solved this problem. Did you filter these data when training the model? Or did you just ignore them and got a good result still?