Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
695 stars 153 forks source link

WER #38

Open Domhnall-Liopa opened 2 years ago

Domhnall-Liopa commented 2 years ago

Hi, thanks for the great work.

When I test the pre-trained multi-speaker model on the LRW test set I get similar STOI and ESTOI values quoted in the paper but the best WER I can achieve is 79.6% compared to the 34.2% in the paper.

Could you specify the steps you used to achieve 34.2% WER with Google ASR? Do you crop the synthesised word and use a specific Google ASR model/configuration? Do you use the entire LRW test dataset or just a subset?

It would be great to know for fair comparison of future research.

Thanks