NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
855 stars 183 forks source link

Training DB for Waveglow pretrained model in this repo #37

Closed hash2430 closed 4 years ago

hash2430 commented 4 years ago

Hello, May I ask what database is used for training the pretrained waveglow model whose link is attached in the ReadMe of this repo? I searched Nvidia GPU Cloud to see if I can find description on this pretrained model but pretrained Waveglow checkpoints on that repo was only on single speaker (LJSpeech). I don't think the pretrained waveglow that is refered to in this repo is trained on single speaker though. Thanks!

rafaelvalle commented 4 years ago

The WaveGlow linked in the readme was trained on a single speaker with proprietary studio quality data. This suggests that WaveGlow is an universal decoder.

hash2430 commented 4 years ago

Wow. Thanks. I'm surprised to hear how universer it can be. That's good to know :D

rafaelvalle commented 4 years ago

Spread the word :-)