SWivid / F5-TTS

Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
https://arxiv.org/abs/2410.06885
MIT License
126 stars 11 forks source link

Is it possible to train TTS for a new language? #5

Closed AigizK closed 15 hours ago

AigizK commented 18 hours ago

Thank you for your work. I would like to inquire about the possibility of training for a new language. If this is feasible, could you please provide more details on the following:

Your insights on this matter would be greatly appreciated. Thank you in advance for your assistance.

ScottishFold007 commented 17 hours ago

1. Required Data Volume

2. Data Format

3. Required Resources

4. Results Comparable to English

5. Training Details

SWivid commented 17 hours ago

All training details is mentioned in our paper.

And you could simply train your own model for a new language:

  1. Leverage Emilia Dataset (DE EN FR JA KO ZH), as we have include script for it (NOTE. download the mentioned version of Emilia in script, cuz it's currently updated to a WebDataset ver.)
  2. or prepare your own data pairs if not covered, just tailor a Dataset Class in model/dataset.py to your need

For Base model (multilingual, ~300M), we use <50K hours for each language (EN ZH) For Small model (e.g. Chinese-only, ~150M), we have made it work with just 1K hours data, config. mentioned in our paper also

Just one thing, the training would take a long time, especially for E2 TTS (if you choose) And be patient, 8 x RTX3090 small model for one week (200~400K updates to hear something reasonable) 8 x A100 for base model similarly.