Closed GuangChen2016 closed 2 years ago
@GuangChen2016 can you share your model training config and your data information?
I use default config with upsample_scales of [8 8 2 2], stacks number 4 and set weight norm to be false for both generator and discriminator. And I use multi-resoluation stft loss for generator. Data: samplerate 24Khz, hopsize 256.
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs.
I have trained a Melgan vocoder using my own data, but when its used for end-to-end TTS, some of the synthesized results (about 3% utterances)has some artifacts (noise). In details, the mel-spectrum in corresponding ares discontinuous, shown as follows: Any suggestions to improve the this?