synthesis quality - Githubissues

X-LANCE / VoiceFlow-TTS

[ICASSP 2024] This is the official code for "VoiceFlow: Efficient Text-to-Speech with Rectified Flow Matching"

https://cantabile-kwok.github.io/VoiceFlow/

276 stars 20 forks source link

synthesis quality #5

Closed forwiat closed 9 months ago

forwiat commented 9 months ago

Hi author, I attempt to train VoiceFlow in aishell3 dataset, but some noise appeared in synthesized audio. Maybe it because of english vocoder?

cantabile-kwok commented 9 months ago

Yes, the vocoder might be mismatched, since we only provided the HifiGAN trained on LJspeech. If you choose to train on AIShell3, it is recommended to use a Chinese vocoder trained on that, or a universal one.

By the way, could you provide some audio samples with noise, so that we can diagnose?

cantabile-kwok commented 9 months ago

I would like to add another note: the mel-spectrogram features for training the TTS model must match the ones used for training the vocoder. So if you use another vocoder whose input mel-spectrograms have different parameters (e.g. different frame shift, window length, etc.), corresponding modifications should be made to the feature extraction scripts provided in this repo : )

forwiat commented 9 months ago

yeah, I noticed this section, I will try it again. Thanks for tips! In addition, Maybe I have no permission? I can't upload wav files or picture in comment.

cantabile-kwok commented 9 months ago

Oh that's a limitation by github. Maybe next time you can try upload to google drive and paste the links here if necessary

forwiat commented 9 months ago

Ok, I try to train a new vocoder to judge whether dataset field mismatch.