Closed jodumagpi closed 3 years ago
Hello, @jodumagpi
In this case, you must train the neural vocoder on your own dataset.
However, training the both melspectrogram synthesizer such as FastSpeech2 and neural vocoder such as VocGAN can be performed independently. In synthesis phase, both model must be prepaired to generate high-quality speech.
I understand that both have to be trained. Which one should I train first?
You can train them parallelly. However, I recommend that you train vocoder, first. Generally, neural vocoder requires massive train samples and spends lots of times to generate high-fidelity speech samples.
While training vocoder, train mel-spectrogram generator.
Then, why is it included in the instruction to download a pre-trained VocGAN to train FastSpeech2? If I can train them parallelly, does that mean that the vocoder is not really required in training FastSpeech2?
Pretrained vocoder is needed to check whether the FastSpeech2 model training goes wrong or not in the "evaluation" phase.
So that means I have to have a pretrained vocoder first before I can train FastSpeech2, right?
In that sense, you are right.
When I replied before, I was considering "efficiency of training both models" to save your time.
Oh. Thank you for being patient! It's all cleared now.
You're welcome. If you got any question, feel free to ask me.
Thank you.
I just want to clarify if we need to train the vocoder first if we want to train this synthesizer with a different dataset? For instance, you recommend downloading the pre-trained VocGAN but that is only trained on KSS. Now, I want to train on a different dataset, should I also retrain the VocGAN on my dataset first or can I train it after training the TTS?