Open BuaaAlban opened 4 years ago
@BuaaAlban have you tried using an audio file directly to make sure that the quality degradation is coming from the vocoder and not tacotron? You can run the following pipeline:
audio -> MelVocoder() -> mel -> MelVocoder.inverse() -> y_audio
and check whether the quality is good enough or not. In my experience so far, it is pretty good.
Can I get the quality of end2end https://melgan-neurips.github.io/ in the demo using the preovided model ? I have tried Fastspeech and Tacotron2 to generated Mel spectrogram, and use the pretrained melgan vocoder to generate wav, but the result can't reach the quality of the demo and it's no better than waveglow. What should I do to improve the performance?