fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.14k stars 698 forks source link

Successful training with Mixture of Logistic Distribution #44

Closed erogol closed 5 years ago

erogol commented 5 years ago

Example result: https://soundcloud.com/user-565970875/ljspeech-logistic-wavernn

Here is the [branch] (https://github.com/erogol/WaveRNN/tree/mold) if you like to try. The model has trained with TTS spectrograms on LJSpeech dataset. Models are soon to be released.

@fatchord would you prefer to have the trained WaveRNN model here, or better to have a new repository for this?

fatchord commented 5 years ago

@erogol Wow, that sounds great - congrats!

Sure, let's put it in here. I had a quick look at your repo and it seems most of the functionality is the same besides the mixture of logistics and some of the dsp preprocessing hparams? Anything else I'd need to do?

Also, if you don't mind - how many steps did you train that model for? And did you use the batched generation for that clip?

erogol commented 5 years ago

@fatchord thanks!

I guess we need to add raw bit training mode as well to the branch. It's supposed to set by config.json mode.

It was trained for a long time. Around 1m steps. But I don't know when it started to work. I've not checked all the checkpoints.

fatchord commented 5 years ago

@erogol Sorry for the delay getting back to you. I haven't forgotten - I'm just going to finalise a couple of things with the vanilla tacotron one model and then start training the vocoder on MOL.

mazzzystar commented 5 years ago

@erogol @fatchord Glad to see you guys work together ! I tried @fatchord 's vanilla TTS with Quick Start, the samples is impressive with good quality and fast synthesis speed (~12khz @ GTX 1080Ti). Some unnatural part I feel is about coherence between words, maybe you may try as @erogol did of Location Sensitive Attention.

Hope to see the TTS + WaveRNN work both fast and low-computation :-p

fatchord commented 5 years ago

@mazzzystar I just uploaded new pretrained models and the sound quality is a bit better.

fchpro commented 2 years ago

I listened to the Soundcloud example and it's pretty amazing! congratulation ! Is the voice model up for sale ? I have a project that will have TTS implemented in it and I would love to use that voice. Also I'm going to use the SAPI5 Microsoft TTS engine so would be compatible with it ? (I need to use the Microsoft SAPI because I'm using the python pyttsx3 module to generate the voice and it uses it)