fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

Non-conditioned WaveRNN generation #167

Closed gtLara closed 4 years ago

gtLara commented 4 years ago

Hello,

Listed among the paramaters in the hiperparams file is "ignore_tts", which is advised to be set to true if there is only interest in WaveRNN. This appears to imply that the code supports non conditioned or recurrent wavernn generation (the "blabering speech" output), however I am not able to find a way to do this.

I have set the aforementioned parameter to "true" causing the preprocessing to skip a few steps and not search for text and trained a model, but I am having no luck using this model to generate "babbling speech" (I have tried executing gen_wavernn.py).

Does anyone know how to achieve what I am looking for?

mindmapper15 commented 4 years ago

Setting "ignore_tts" to true in hyperparameter means that the WaveRNN model uses mel-spectrograms from real audio which doesn't require pre-trained tts model.

Unless you modify the WaveRNN model codes, the mel-spectrograms(whether they generated from pre-trained TTS model or extracted directly from real audio) are always used as local condition of WaveRNN.