Closed ymzlygw closed 2 years ago
@ymzlygw about voice clone, you just need to use a good speaker embedding model to extract speaker embedding from each sample and feed it into tacotron2 or fastspeech2 as speaker embedding layer. You can use a pretrained speaker embedding model from here (https://github.com/clovaai/voxceleb_trainer).
Thanks, so it means need to change the structure of tacotron2 by adding a speaker embedding layer? Is there any doc about how to change it?
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
HI, Thanks for your work and , could you please tell me if you have any implementation for voice-clone using multi-speaker dataset? Or some advice about how to merge your work with the synthesizer in Real-Time-Voice-Cloning. I aims to convert a voice-clone model to tflite and test the performence. Really thanks and waiting for your reply!