Bad cases or artifacts for Tacotron2 + Melgan vocoder, Any suggestions?

TensorSpeech / TensorFlowTTS

:stuck_out_tongue_closed_eyes: TensorFlowTTS: Real-Time State-of-the-art Speech Synthesis for Tensorflow 2 (supported including English, French, Korean, Chinese, German and Easy to adapt for other languages)

https://tensorspeech.github.io/TensorFlowTTS/

Apache License 2.0

3.8k stars 810 forks source link

Bad cases or artifacts for Tacotron2 + Melgan vocoder, Any suggestions? #753

Closed GuangChen2016 closed 2 years ago

GuangChen2016 commented 2 years ago

I have trained a Melgan vocoder using my own data, but when its used for end-to-end TTS, some of the synthesized results (about 3% utterances）has some artifacts (noise). In details, the mel-spectrum in corresponding ares discontinuous, shown as follows: LEEGHJGKZO(@0@F51_LO$XL Any suggestions to improve the this?

dathudeptrai commented 2 years ago

@GuangChen2016 can you share your model training config and your data information?

GuangChen2016 commented 2 years ago

I use default config with upsample_scales of [8 8 2 2], stacks number 4 and set weight norm to be false for both generator and discriminator. And I use multi-resoluation stft loss for generator. Data: samplerate 24Khz, hopsize 256.

stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.