Open nikawool opened 4 years ago
Have you tried Fastspeech combined with melgan? How is the result?
vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG") recons = vocoder.inverse(mel_outputs.float()).squeeze().cpu().numpy() ipd.Audio(recons , rate=22050)
vocoder = MelVocoder(path="logs/baseline14k/",model_name="best_netG")
meldata = mel_outputs.float() meldata.shape torch.Size([1, 80, 503]) rev_wav = vocoder.inverse(meldata.float())#.squeeze().cpu().numpy() rev_wav.shape torch.Size([1, 128768]) rev_wav.dtype torch.float32 rev_wav2 = rev_wav.cpu().numpy() rev_wav2.shape (1, 128768) ipd.Audio((rev_wav2.reshape((-1))*2**15).astype(np.int16), rate=22050)
Same results.
FastSpeech: https://github.com/xcmyz/FastSpeech How can I combine melGAN with feature predictor like FastSpeech or tacotron2?