Closed forwiat closed 9 months ago
Yes, the vocoder might be mismatched, since we only provided the HifiGAN trained on LJspeech. If you choose to train on AIShell3, it is recommended to use a Chinese vocoder trained on that, or a universal one.
By the way, could you provide some audio samples with noise, so that we can diagnose?
I would like to add another note: the mel-spectrogram features for training the TTS model must match the ones used for training the vocoder. So if you use another vocoder whose input mel-spectrograms have different parameters (e.g. different frame shift, window length, etc.), corresponding modifications should be made to the feature extraction scripts provided in this repo : )
yeah, I noticed this section, I will try it again. Thanks for tips! In addition, Maybe I have no permission? I can't upload wav files or picture in comment.
Oh that's a limitation by github. Maybe next time you can try upload to google drive and paste the links here if necessary
Ok, I try to train a new vocoder to judge whether dataset field mismatch.
Hi author, I attempt to train VoiceFlow in aishell3 dataset, but some noise appeared in synthesized audio. Maybe it because of english vocoder?