descriptinc / melgan-neurips

GAN-based Mel-Spectrogram Inversion Network for Text-to-Speech Synthesis
MIT License
980 stars 214 forks source link

How to combine melGAN with feature predictor like FastSpeech or tacotron2? #17

Open nikawool opened 4 years ago

nikawool commented 4 years ago

FastSpeech: https://github.com/xcmyz/FastSpeech How can I combine melGAN with feature predictor like FastSpeech or tacotron2?

Liujingxiu23 commented 4 years ago

Have you tried Fastspeech combined with melgan? How is the result?

Teravus commented 4 years ago

I've been playing with Tacotron2's inference notebook.. but so far just noise for me. I copied the mel2wav folder and my checkpoint log directory to the tacotron2 directory I end up adding a section after the RemoveWaveGlow bias section of the notebook.

vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG") recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy() ipd.Audio(recons , rate=22050)

I've also tried;

vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG")

recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy()

meldata = mel_outputs.float() meldata.shape torch.Size([1, 80, 503]) rev_wav = vocoder.inverse(meldata.float())#.squeeze().cpu().numpy() rev_wav.shape torch.Size([1, 128768]) rev_wav.dtype torch.float32 rev_wav2 = rev_wav.cpu().numpy() rev_wav2.shape (1, 128768) ipd.Audio((rev_wav2.reshape((-1))*2**15).astype(np.int16), rate=22050)

Same results.