descriptinc / melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
MIT License
980 stars 214 forks source link

How good are the pretrained model? #27

Open BuaaAlban opened 4 years ago

BuaaAlban commented 4 years ago

Can I get the quality of end2end https://melgan-neurips.github.io/ in the demo using the preovided model ? I have tried Fastspeech and Tacotron2 to generated Mel spectrogram, and use the pretrained melgan vocoder to generate wav, but the result can't reach the quality of the demo and it's no better than waveglow. What should I do to improve the performance?

rijulg commented 4 years ago

@BuaaAlban have you tried using an audio file directly to make sure that the quality degradation is coming from the vocoder and not tacotron? You can run the following pipeline:

audio -> MelVocoder() -> mel -> MelVocoder.inverse() -> y_audio

and check whether the quality is good enough or not. In my experience so far, it is pretty good.