Literature Survey TTS - Githubissues

rmalav15 commented 6 years ago

Hi Kyubyong Park,

Much thanks for the wonderful git profile, very very helpful especially for someone like me who is very interested in TTS domain.

I was exploring all the recent work in the domain, namely Facebook's VoiceLoop and the ones you already implemented DeepVoice3, DCTTS and Tacotron. The samples provided by these works DeepVoice3, Tacotron and DCTTS are very close to (Almost) human-like. But with the current codes that you and the open source community have implemented, its slightly different picture (The final speech have robotic nature). I was more interested in going ahead with Tacotron, but after comparing your results for tacotron and dctts, I am liking DCTTS better. Can you please suggest me what should be my way to go if I want to generated human like speech for my future research? (Currently I am not looking at Facebook's VoiceLoop)

In Deepvoice3 paper it was mentioned (quote):

The WaveNet vocoder sounds more natural as the WORLD vocoder introduces various noticeable artifacts.

Can you please tell me which vocoder is best in generating human-like speech based on your experience and domain knowledge?

It will be huge help if you can provide me comparison of the difficulties in training, time taken in generation (Real time or Not) and final result of all these methods.

Apologies for choosing this inappropriate platform for such discussion and my lengthy question, I am a newbie in this domain.

Thanks in Advance,

Ram

rmalav15 commented 6 years ago

Hi @Kyubyong

Hoping for a response. It will be great help.

Kamsamgida.

amilamad commented 6 years ago

@rmalav15 Have you tried the https://github.com/r9y9/deepvoice3_pytorch implementation. It supports multi voice speaker tts and Japaneses tts. And it is better than DCTTS. And the author of that project is currently to Integrating WaveNet vocoder. see the samples https://github.com/r9y9/wavenet_vocoder

rmalav15 commented 6 years ago

Thank You @amilamad,

I have looked at deepvoice3 implementation of r9y9, but wasn't aware of wavenet encoder integration. So thank you. I recently watched 2-minute paper's video on wavenet's new release, So I think it will surely help me with my project.

mrgloom commented 5 years ago

This repo have good quality https://github.com/NVIDIA/tacotron2 But tacotron2 + waveglow is slow.

Kyubyong / deepvoice3

Literature Survey TTS #15