Rudrabha / Lip2Wav

This is the repository containing codes for our CVPR, 2020 paper titled "Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis"
MIT License
692 stars 152 forks source link

About SV2TTS concat #5

Closed MontaEllis closed 4 years ago

MontaEllis commented 4 years ago

Hi, I am interested in your project. but I'm confused about SV2TTS. SV2TTS embed one person's sound to vector, however, your project train model for every single person. And the code you released concat zero matrix with encoder ouput. Can I know why you use SV2TTS? Thanks a lot!

prajwalkr commented 4 years ago

Hello, we have a multi-speaker version of the model as well, trained on the LRW dataset. We will be releasing the code and models for that in a separate branch soon. While training for the single-speaker case, we set the vector to a vector of zeros for simplicity.

SV2TTS is used for our multi-speaker model. Thank you for your interest in our work! :)