Closed begeekmyfriend closed 5 years ago
https://github.com/begeekmyfriend/Tacotron-2/tree/griffin-lim
Sorry, but the synthesizer is still using Griffin-Lim algorithm in this branch right? I was hoping for your World Vocoder implementation for LJ Speech. Changes needed for Hparams and preprocessing?
Sorry, I forgot it. You might diff the griffin-lim
and mandarin-griffin-lim
branch and merge it into mandarin-world-vocoder
branch. By the way in my test, the evaluation from WORLD plus Tacotron did not work as well as that from Griffin-Lim.
Sorry, I forgot it. You might diff the
griffin-lim
andmandarin-griffin-lim
branch and merge it intomandarin-world-vocoder
branch. By the way in my test, the evaluation from WORLD plus Tacotron did not work as well as that from Griffin-Lim.
Thanks! I actually wanted to try World Vocoder as GL is really slow. I have managed to perform optimizations in the Tacotron part of the model and it is performing faster than before on CPU, but GL is still slow and reducing the number of iterations just kills the quality. I will try your implementation and see if i can get a good result.
Give up this solution and turn to WaveRNN. Feel free to reopen this issue.
@begeekmyfriend Hi, thank you for your great work. Have you tried Tacotron2 + WORLD vocoder for long sentence? When I test WORLD vocoder on very long sentence (about 200 characters), it gets some uncomfortable sound (something like break sound) in the sound. Have you experienced this problem? and what do you think of the solution for this? long.wav.zip
You might split that long sentence into two sentences according to the pause point.
您好,Python-Wrapper-for-World-Vocoder 这个链接已经失效了,可以重新发一下吗?是否已经提供了pretrain model呢?
@JJ-Guo1996 I am not engaging in TTS any more. And I do not remember how I deleted that repo you mentioned. Maybe there are substitutes for the WORLD vocoder listed in the README file. You can refer to it.
Hey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results as follows. The results sound not bad but still not perfect. However it shows another way to train different feature parameters with Tacotron. The World vocoder is an open source project and thus everyone can use it for all. Moreover the quality of resynth results from that vocoder is better than that from Griffin-Lim since the three features (lf0[1], mgc[60] and ap[5]) contain not only magnitude spectrograms but also phase information. Furthermore the depth of the features is low enough that we do not need postnet for Tacotron model. The performance of training can be reduced to 0.7 second per step. The inference can also be quick enough even it only works on CPU. So it really worthes trying.
I would like to share my experimental source code with you as follows. Note that it currently only for Chinese mandarin. You may modify it for other languages: tacotron-world-vocoder branch Python-Wrapper-for-World-Vocoder pysptk merlin-world-vocoder branch By the way you need use
python setup.py install
and the copy the so file manually into the system path forpysptk
and python wrapper project.Besides I also would like to provide two Python scripts for World vocoder resynth test. world_vocoder_resynth_scripts.zip
@Rayhane-mamah Let us rock with it! And @r9y9 thanks for your
pysptk
project. world_vocoder_demo.zip