Open lorinczb opened 5 years ago
Hi, When we try with a 12-hour corpus, model never converges. With 22-hour, the result is better but not good. Our current results show that you should train with a data set with 30-hour or more to get a model for real application. It takes about 20 epochs to get a listenable sound and about 250 epochs for clear and clean sound.
I was training tacorton-gst for about 200 epochs on a different dataset, not LJ nor MAILABS, but a corpus on Romanian language. The model doesn't seem to converge after 200 epochs, and the inference is not yet intelligible. Do you have a recommendation on the number or epochs, does it need to run for much more epochs?
Based on your experiments what is the smallest dataset that you have trained the model on? What is kind of the minimum number of hours of speech that is recommended?