Support Voice-Clone for multi-speaker dataset?

TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

https://tensorspeech.github.io/TensorFlowTTS/

Apache License 2.0

3.8k stars 810 forks source link

Support Voice-Clone for multi-speaker dataset? #705

Closed ymzlygw closed 2 years ago

ymzlygw commented 2 years ago

HI, Thanks for your work and , could you please tell me if you have any implementation for voice-clone using multi-speaker dataset? Or some advice about how to merge your work with the synthesizer in Real-Time-Voice-Cloning. I aims to convert a voice-clone model to tflite and test the performence. Really thanks and waiting for your reply!

dathudeptrai commented 2 years ago

@ymzlygw about voice clone, you just need to use a good speaker embedding model to extract speaker embedding from each sample and feed it into tacotron2 or fastspeech2 as speaker embedding layer. You can use a pretrained speaker embedding model from here (https://github.com/clovaai/voxceleb_trainer).

ymzlygw commented 2 years ago

https://github.com/clovaai/voxceleb_trainer

Thanks, so it means need to change the structure of tacotron2 by adding a speaker embedding layer? Is there any doc about how to change it?

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.