MycroftAI / mimic2

Text to Speech engine based on the Tacotron architecture, initially implemented by Keith Ito.
Apache License 2.0
581 stars 103 forks source link

Alignment graph is fine during training; seems to get stuck when synthesized? #32

Closed Scrollkeeper closed 5 years ago

Scrollkeeper commented 5 years ago

Hi there, So I am training the model on 1,705 wav files which total to around 3 hours of data. The results have been fantastic so far, and it is already sounding clear at 3,000 steps. I realize more training is necessary but I've run into a problem when testing the model which seems more like a scripting problem than a Tacotron-based problem

The alignment graph so far when exported from a checkpoint during training: step-3000-align (Yes, I am aware it needs to train farther; however, the beginnings of alignment are definitely there.)

The alignment graph when exported from eval.py: eval-3000-2

The sound that is exported with eval.py sounds like the training data's voice, but stuck in a stuttering loop. I've tried it with a fresh clone of the repo as well, and I'm wondering if this is just a "needs more training" problem or something more.

Thank you for your patience; I'm still relatively new to machine learning, and I appreciate your help. :)

Scrollkeeper commented 5 years ago

Thinking about it more, it probably is a "needs more training" problem. The people at https://github.com/keithito/tacotron say that without proper alignment you will still get good results from train.py's export while the actual synthesis is not quite there yet. Also, the stuttering that sounds like the training data is probably its alignment starting to work. I will let it train for a day or so more and report back. Any input is welcome in the meantime! :)

el-tocino commented 5 years ago

You should see a diagonal alignment line by 25-50k steps. After that a matter of fine tuning things to fit (or just training til you're happy with the results).

Scrollkeeper commented 5 years ago

Yep, definitely a "needs more training" scenario! Thank you for your time. :)