Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
692 stars 152 forks source link

Unable to reproduce the score claimed in paper #35

Open psui3905 opened 2 years ago

psui3905 commented 2 years ago

Hi, thanks for the great work. I'm able to reproduce the score claimed in the paper using our pre-trained model weights. However, when I tried to train Lip2wav on the chem speaker without the weights. the score seems not very good. Here is the result I got:

# Speaker: chem
# Lip2wav - using pre-trained weights 
Mean PESQ: 1.2984
Mean STOI: 0.4285
Mean ESTOI: 0.3204

# Lip2wav - checkpoint on steps 230k 
Mean PESQ: 1.1618
Mean STOI: 0.3245
Mean ESTOI: 0.1539

Tensorboard: Screen Shot 2021-09-06 at 8 39 30 pm

Both scores get under the same training environment (ffmpeg version 2.8.17) and the same codebase. What could be the potential issue of this?