MycroftAI / mimic2

Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
Apache License 2.0
580 stars 103 forks source link

How many recording hours is mimic trained on? #52

Closed NoamDev closed 4 years ago

NoamDev commented 4 years ago

Is it possible the main problem is lack of data? In google's example they trained it with 22 hours, but the samples of mimic it was trained on only 16 hours.

el-tocino commented 4 years ago

What problem are you experiencing?

NoamDev commented 4 years ago

It's just that it doesn't sound as good as in google's sample. Google trained their model with 22 recording hours, but in the README here, it says the linked examples were generated from a model trained with only 16 hours of recording. Could more training data make the model better? Has anyone tried such thing?

el-tocino commented 4 years ago

24.6 for tacotron1/2.
The voices you hear from google assistant all run through a vocoder (parallel wavenet or something) as well. They've updated their tts engine with other adjustments; unfortunately, they do not release code and the papers they publish have to be reverse engineered.

To answer your question, there's some small improvements that could be made with more data, possibly. I suspect a better option will be to move to a tacotron2-based tts engine with vocoder instead.

NoamDev commented 4 years ago

I understand, thanks!

el-tocino commented 4 years ago

If you're super interested, they have an archive of their papers: https://google.github.io/tacotron/