Edresson / YourTTS

YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Other
884 stars 77 forks source link

Train YourTTS on another language #12

Closed annaklyueva closed 1 year ago

annaklyueva commented 2 years ago

Good day!

I have several questions, could you please help?

Do I understand correctly that if I want to train the model on another language it is better to fine tune this model (YourTTS-EN(VCTK+LibriTTS)-PT-FR SCL): https://drive.google.com/drive/folders/15G-QS5tYQPkqiXfAdialJjmuqZV0azQV Or it is better to use other checkpoints.

How many hours of audio is needed to have appropriate quality?

I planned to use Common Voice Corpus to fine-tune the model on a new language, however, the audio format is mp3 not wav. Do I need to convert all the audio files or I can use mp3 format. If yes, how?

Thank you for your time in advance!

Edresson commented 2 years ago

Hi,

Yes the better is to fine-tune mentioned model.

How many hours of audio is needed to have appropriate quality?

We didn't analyze the number of hours needed to learn new languages in the YourTTS article.

I planned to use Common Voice Corpus to fine-tune the model on a new language, however, the audio format is mp3 not wav. Do I need to convert all the audio files or I can use mp3 format. If yes, how?

Yes, you need to convert the files to wav and resample it to the right sampling rate (the released model was trained in 16000 Hz). If you like you can use the resample script available at the Coqui TTS repository.

annaklyueva commented 2 years ago

@Edresson Thank you very much for your help!

As far as I understand I also need to add "charecters" of my language to config.json, am I right? Do I need to add something to "phonemes" part of the .json? Also do I need to change "phoneme_language" to the language I use?

And is it better to change the number of epochs/ batch size?

Maybe you can give me some other tips how to fine-tune the model on a new language?

Edresson commented 2 years ago

@Edresson Thank you very much for your help!

As far as I understand I also need to add "charecters" of my language to config.json, am I right? Do I need to add something to "phonemes" part of the .json? Also do I need to change "phoneme_language" to the language I use?

And is it better to change the number of epochs/ batch size?

Maybe you can give me some other tips how to fine-tune the model on a new language?

You are welcome!

Use the latest version of Coqui TTS and use this script to find all phonemes in our dataset and this to find the unique chars. After update the vocabulary and add the new datasets follow the steps describe here to training.

annaklyueva commented 2 years ago

@Edresson Thank you very much for your help! As far as I understand I also need to add "charecters" of my language to config.json, am I right? Do I need to add something to "phonemes" part of the .json? Also do I need to change "phoneme_language" to the language I use? And is it better to change the number of epochs/ batch size? Maybe you can give me some other tips how to fine-tune the model on a new language?

You are welcome!

Use the latest version of Coqui TTS and use this script to find all phonemes in our dataset and this to find the unique chars. After update the vocabulary and add the new datasets follow the steps describe here to training.

Ok! Thank you!