Rayhane-mamah / Tacotron-2

DeepMind's Tacotron-2 Tensorflow implementation
MIT License
2.27k stars 905 forks source link

Tacoton-2 plus World vocoder #304

Closed begeekmyfriend closed 5 years ago

begeekmyfriend commented 5 years ago

Hey I am glad to inform you that I have succeeded to merge Tacotron model with World vocoder and generated some evaluation results as follows. The results sound not bad but still not perfect. However it shows another way to train different feature parameters with Tacotron. The World vocoder is an open source project and thus everyone can use it for all. Moreover the quality of resynth results from that vocoder is better than that from Griffin-Lim since the three features (lf0[1], mgc[60] and ap[5]) contain not only magnitude spectrograms but also phase information. Furthermore the depth of the features is low enough that we do not need postnet for Tacotron model. The performance of training can be reduced to 0.7 second per step. The inference can also be quick enough even it only works on CPU. So it really worthes trying.

I would like to share my experimental source code with you as follows. Note that it currently only for Chinese mandarin. You may modify it for other languages: tacotron-world-vocoder branch Python-Wrapper-for-World-Vocoder pysptk merlin-world-vocoder branch By the way you need use python setup.py install and the copy the so file manually into the system path for pysptk and python wrapper project.

Besides I also would like to provide two Python scripts for World vocoder resynth test. world_vocoder_resynth_scripts.zip

@Rayhane-mamah Let us rock with it! And @r9y9 thanks for your pysptk project. world_vocoder_demo.zip image

sujeendran commented 5 years ago

https://github.com/begeekmyfriend/Tacotron-2/tree/griffin-lim

Sorry, but the synthesizer is still using Griffin-Lim algorithm in this branch right? I was hoping for your World Vocoder implementation for LJ Speech. Changes needed for Hparams and preprocessing?

begeekmyfriend commented 5 years ago

Sorry, I forgot it. You might diff the griffin-lim and mandarin-griffin-lim branch and merge it into mandarin-world-vocoder branch. By the way in my test, the evaluation from WORLD plus Tacotron did not work as well as that from Griffin-Lim.

sujeendran commented 5 years ago

Sorry, I forgot it. You might diff the griffin-lim and mandarin-griffin-lim branch and merge it into mandarin-world-vocoder branch. By the way in my test, the evaluation from WORLD plus Tacotron did not work as well as that from Griffin-Lim.

Thanks! I actually wanted to try World Vocoder as GL is really slow. I have managed to perform optimizations in the Tacotron part of the model and it is performing faster than before on CPU, but GL is still slow and reducing the number of iterations just kills the quality. I will try your implementation and see if i can get a good result.

begeekmyfriend commented 5 years ago

Give up this solution and turn to WaveRNN. Feel free to reopen this issue.

byuns9334 commented 4 years ago

@begeekmyfriend Hi, thank you for your great work. Have you tried Tacotron2 + WORLD vocoder for long sentence? When I test WORLD vocoder on very long sentence (about 200 characters), it gets some uncomfortable sound (something like break sound) in the sound. Have you experienced this problem? and what do you think of the solution for this? long.wav.zip

begeekmyfriend commented 4 years ago

You might split that long sentence into two sentences according to the pause point.

JJun-Guo commented 1 year ago

您好,Python-Wrapper-for-World-Vocoder 这个链接已经失效了,可以重新发一下吗?是否已经提供了pretrain model呢?

begeekmyfriend commented 1 year ago

@JJ-Guo1996 I am not engaging in TTS any more. And I do not remember how I deleted that repo you mentioned. Maybe there are substitutes for the WORLD vocoder listed in the README file. You can refer to it.