NVIDIA / nv-wavenet

Reference implementation of real-time autoregressive wavenet inference
BSD 3-Clause "New" or "Revised" License
735 stars 126 forks source link

How to integrate it with r9y9/Tacotron-2 ? #48

Open rishikksh20 opened 6 years ago

rishikksh20 commented 6 years ago

Tacotron-2 implementation of r9y9 (https://github.com/r9y9/Tacotron-2) output the mel-spectrogram but when I give that mel-spectrogram input to nv-wavenet after covert .npy file to torch tensor and do inference then it generates noise. Do I have to do some extra to Tacotron 2 generate mel-spectrogram then input to nv-wavenet for speech synthesis?

gsoul commented 6 years ago
  1. Ideally, you should train on the same spectrograms as you're going to do inference on
  2. At minimum you might want to look through the closed issues of NVIDIA/tacotron2 to find some examples you could base your solution from, like this one: https://github.com/NVIDIA/tacotron2/issues/52
RPrenger commented 6 years ago

@rishikksh20 Are you trying to use the WaveNet from r9y9 for inference, or are you training a WaveNet using the code in the PyTorch directory? If you're using a WaveNet from r9y9, can you post the code you're using to try the bindings?

rishikksh20 commented 6 years ago

@rafaelvalle is it possible to integrate r9y9's Tacotron 2 with this repo ? If yes then what changes are required to do with Tacotron 2 output's Mel spectrogram to compatible with nv-wavenet input Mel spectrogram.

rafaelvalle commented 6 years ago

@rishikksh20 In addition to training Tacotron2 and Wavenet on the same data, the same mel-spectrogram representation has to be used, including FFT and mel params.