Open jiqizaisikao opened 6 years ago
performance of this implementation is no where as close to being as good as google demostrated even after 420k. If you looking for a proper implementation check this https://github.com/syang1993/gst-tacotron has better implementation just after 200k iterations . If someone can continue upto 500k-600k iterations on it , I am sure it will be lot closer to goole's.
I have checked your sample result of 420k trainning,and tried align the referr sound and the target sound,it seems that they are much different. So i really confused how the auther of the paper done that.Maybe he used so large data set containing 100+ hours so that it can get good result.