Should the vocoder be trained first before the TTS synthesizer?

HGU-DLLAB / Korean-FastSpeech2-Pytorch

Implementation of Korean FastSpeech2

MIT License

213 stars 50 forks source link

Should the vocoder be trained first before the TTS synthesizer? #11

Closed jodumagpi closed 3 years ago

jodumagpi commented 3 years ago

I just want to clarify if we need to train the vocoder first if we want to train this synthesizer with a different dataset? For instance, you recommend downloading the pre-trained VocGAN but that is only trained on KSS. Now, I want to train on a different dataset, should I also retrain the VocGAN on my dataset first or can I train it after training the TTS?

Jackson-Kang commented 3 years ago

Hello, @jodumagpi

In this case, you must train the neural vocoder on your own dataset.

However, training the both melspectrogram synthesizer such as FastSpeech2 and neural vocoder such as VocGAN can be performed independently. In synthesis phase, both model must be prepaired to generate high-quality speech.

jodumagpi commented 3 years ago

I understand that both have to be trained. Which one should I train first?

Jackson-Kang commented 3 years ago

You can train them parallelly. However, I recommend that you train vocoder, first. Generally, neural vocoder requires massive train samples and spends lots of times to generate high-fidelity speech samples.

While training vocoder, train mel-spectrogram generator.

jodumagpi commented 3 years ago

Then, why is it included in the instruction to download a pre-trained VocGAN to train FastSpeech2? If I can train them parallelly, does that mean that the vocoder is not really required in training FastSpeech2?

Jackson-Kang commented 3 years ago

Pretrained vocoder is needed to check whether the FastSpeech2 model training goes wrong or not in the "evaluation" phase.

jodumagpi commented 3 years ago

So that means I have to have a pretrained vocoder first before I can train FastSpeech2, right?

Jackson-Kang commented 3 years ago

In that sense, you are right.

When I replied before, I was considering "efficiency of training both models" to save your time.

jodumagpi commented 3 years ago

Oh. Thank you for being patient! It's all cleared now.

Jackson-Kang commented 3 years ago

You're welcome. If you got any question, feel free to ask me.

Thank you.