Closed MontaEllis closed 4 years ago
Hello, we have a multi-speaker version of the model as well, trained on the LRW dataset. We will be releasing the code and models for that in a separate branch soon. While training for the single-speaker case, we set the vector to a vector of zeros for simplicity.
SV2TTS is used for our multi-speaker model. Thank you for your interest in our work! :)
Hi, I am interested in your project. but I'm confused about SV2TTS. SV2TTS embed one person's sound to vector, however, your project train model for every single person. And the code you released concat zero matrix with encoder ouput. Can I know why you use SV2TTS? Thanks a lot!