NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.55k stars 369 forks source link

Number of training epochs and dataset size for tacotron-gst? #482

Open lorinczb opened 5 years ago

lorinczb commented 5 years ago

I was training tacorton-gst for about 200 epochs on a different dataset, not LJ nor MAILABS, but a corpus on Romanian language. The model doesn't seem to converge after 200 epochs, and the inference is not yet intelligible. Do you have a recommendation on the number or epochs, does it need to run for much more epochs?

Based on your experiments what is the smallest dataset that you have trained the model on? What is kind of the minimum number of hours of speech that is recommended?

hao-olli-ai commented 5 years ago

Hi, When we try with a 12-hour corpus, model never converges. With 22-hour, the result is better but not good. Our current results show that you should train with a data set with 30-hour or more to get a model for real application. It takes about 20 epochs to get a listenable sound and about 250 epochs for clear and clean sound.