Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
692 stars 152 forks source link

during preprocess how to save frames without faces? #37

Open dongdongdashen opened 2 years ago

dongdongdashen commented 2 years ago

Hi,this is a great job. I try to use my own dataset to reconstruct the speech.The dataset are videos including medical images of vocal organs without human faces.Can you tell me how to save these frames without faces? Thanks a lot!

prajwalkr commented 2 years ago

Hi, you would need a different preprocessing script, where you specify in each frame which part of the image to save as a "crop". In our evaluation script, for example, we save the face region given by the face detector as the crop for that frame.

dongdongdashen commented 2 years ago

Thanks for your reply!Now I can get the medical images list but it seems can't run in training. I am now looking for the cause. I notice the chem images are about 120 x 180 and mine are 580 x 360.Do I need to adjust the size of my images before training?Besides,my videos are 60 fps not 30 fps (3~5 seconds each),do I need to modify related parameters to match my data?