Closed Scrollkeeper closed 5 years ago
Thinking about it more, it probably is a "needs more training" problem. The people at https://github.com/keithito/tacotron say that without proper alignment you will still get good results from train.py's export while the actual synthesis is not quite there yet. Also, the stuttering that sounds like the training data is probably its alignment starting to work. I will let it train for a day or so more and report back. Any input is welcome in the meantime! :)
You should see a diagonal alignment line by 25-50k steps. After that a matter of fine tuning things to fit (or just training til you're happy with the results).
Yep, definitely a "needs more training" scenario! Thank you for your time. :)
Hi there, So I am training the model on 1,705 wav files which total to around 3 hours of data. The results have been fantastic so far, and it is already sounding clear at 3,000 steps. I realize more training is necessary but I've run into a problem when testing the model which seems more like a scripting problem than a Tacotron-based problem
The alignment graph so far when exported from a checkpoint during training:
(Yes, I am aware it needs to train farther; however, the beginnings of alignment are definitely there.)
The alignment graph when exported from eval.py:![eval-3000-2](https://user-images.githubusercontent.com/2720807/60378985-ef76f300-99f8-11e9-9862-36c1720ac8b1.png)
The sound that is exported with eval.py sounds like the training data's voice, but stuck in a stuttering loop. I've tried it with a fresh clone of the repo as well, and I'm wondering if this is just a "needs more training" problem or something more.
Thank you for your patience; I'm still relatively new to machine learning, and I appreciate your help. :)