fatchord / WaveRNN

WaveRNN Vocoder + TTS
https://fatchord.github.io/model_outputs/
MIT License
2.13k stars 698 forks source link

Network connection diagram #130

Open nkcdy opened 5 years ago

nkcdy commented 5 years ago

The picture posted on the first page is a little bit low resolution and the detailed blocks are indistinct. So I redrew a diagram according to the code. Is it the correct understanding to your code? WaveRNN_Block

fatchord commented 5 years ago

@nkcdy that's awesome thanks! And yeah, that looks right. By the way, there's a higher res version of my diagram in the assets folder: https://github.com/fatchord/WaveRNN/blob/master/assets/wavernn_alt_model_hrz2.png

nkcdy commented 5 years ago

oh, what an awkward.... but never mind, it is always a good start point to draw a block diagram when study new paper or new code.

nkcdy commented 5 years ago

By the way, what will happen if I train the network with a multi speakers (such as 400 speakers) corpus. Will it help to improve the generalization capability? or it will fail to get convergence?

fatchord commented 5 years ago

I've managed to train multi-speaker models (without any speaker embedding) in 9bit RAW/mulaw. I haven't tried training a multi-speaker MOL model yet.

linan06kuaishou commented 2 years ago

I've managed to train multi-speaker models (without any speaker embedding) in 9bit RAW/mulaw. I haven't tried training a multi-speaker MOL model yet.

hi how is the sound quality of the speech generated by your multi-speaker model?