Edresson / YourTTS

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Other
884 stars 77 forks source link

Fine Tune for Voice Conversion? #21

Closed jsl303 closed 1 year ago

jsl303 commented 2 years ago

I've tried the voice conversion by providing driving and target samples, but the result doesn't sound like the target at all. It's somewhat closer to driving sample. Is there an instruction on how to fine tune the model to make the output sound better?

kunyao2015 commented 2 years ago

same problem, the generated voice almost as same as the driving sample. And i found that the code just fine-tuning vocoder(hifigan )

Edresson commented 1 year ago

The training procedure for voice conversion and TTS are equal. If you like you can follow the recipe that replicates the first experiment proposed in the YourTTS paper. The recipe replicates the single language training using the VCTK dataset (it downloads, resamples, and extracts the speaker embeddings automatically :)). However, if you are interested in multilingual training, we have commented on parameters on the VitsArgs class instance that should be enabled for multilingual training: https://github.com/coqui-ai/TTS/blob/dev/recipes/vctk/yourtts/train_yourtts.py