Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
692 stars 152 forks source link

consistency of input frames length and output waveform length #34

Open rita-zeng opened 2 years ago

rita-zeng commented 2 years ago

Great work! I wonder how to ensure the consistency of input frame length and output waveform length? When I use GRID datasets to train and test and set the hyper parameters as follow: T = 40 overlap = 10 mel_step_size = 160 mel_overlap = 40 img_size = 96 fps = 25, Test results shows that the ground truth is 3 seconds while the generated waveforms are 7 seconds. How can I solve this problem? Looking forward to your reply!