As I understand, you train your own version of MelGAN for multi-speaker synthesis, as the official code supports the sampling rate of 22.05 kHz, while StyleSpeech operates at 16 kHz.
Could you share the details for reproducibility purposes: which dataset did you use, which parameters did you change? Or you can maybe upload the trained vocoder itself? It would be great!
Sorry for I can not share the trained vocoder. But I use the same LibriTTS dataset for training the MelGAN and I didn't change any parameters from the default ones.
As I understand, you train your own version of MelGAN for multi-speaker synthesis, as the official code supports the sampling rate of 22.05 kHz, while StyleSpeech operates at 16 kHz. Could you share the details for reproducibility purposes: which dataset did you use, which parameters did you change? Or you can maybe upload the trained vocoder itself? It would be great!