descriptinc / melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
MIT License
964 stars 214 forks source link

Tacotron2 + melgan = strange noise #14

Closed vcjob closed 4 years ago

vcjob commented 4 years ago

Hello everyone!

I tried that awsome work with tacotron2 TTS, but unfortunately can't make it work. I use pretrained multi-speaker model from the repo. When I generate the audio, the output is the noise. I don't know why. One more thing to mention - TTS model is not for English. Well, it will probably anyway not be good at other languages than English, but should work I guess. I use 16k sample rate, 1 channel. Please, take a look at the file attached! 177.zip

Do you have any ideas on how to make it work? I save the file like this:

_audio = (audio.cpu().numpy().reshape((-1))*2**15).astype(np.int16) #
scipy.io.wavfile.write(file_path, sampling_rate, audio)_
casper-hansen commented 4 years ago

@vcjob did you get something working?