NVIDIA / OpenSeq2Seq

Toolkit for efficient experimentation with Speech Recognition, Text2Speech and NLP
https://nvidia.github.io/OpenSeq2Seq
Apache License 2.0
1.54k stars 369 forks source link

[Question] Tacotron or Tacotron GST output spectrograms to condition Wavenet #395

Closed oytunturk closed 5 years ago

oytunturk commented 5 years ago

Hi,

Has anyone tried to use Wavenet instead of Griffin-Lim waveform reconstruction during inference in Tacotron or Tacotron GST models? I'm trying to figure out if there is an easy way to do this or whether one would need to implement a proper decoder in Wavenet instead of the default 'FakeDecoder'.

Outline of my recipe: 1) Train a Tacotron or Tacotron GST model to predict spectrograms from text or from phoneme sequences 2) Train a separate Wavenet model conditioned on spectrograms 3) (This is what I'm trying to figure out) - Feed spectrogram outputs of (1) into Wavenet model from (2) to synthesize speech

It wasn't clear to me if the current implementation supports (3) by simply changing config files so any guidance or ideas would be highly appreciated.

Thanks!

Oytun

blisc commented 5 years ago

So long as the parameters you are using to generate the spectrograms match in 1 and 2, 3 should work.