NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 184 forks source link

Sampling rate used for pretrained model of LibriTTS #17

Closed hash2430 closed 4 years ago

hash2430 commented 4 years ago

Hello, I would like to ask about the hparmas used for the LibriTTS pretrained model. The default hparams.py seems to be written for the LJSpeech DB. So it would be appreciated if you could share the hparams you used for the pretrained LibriTTS model. Especially, the sampling rate is set as 22050 on hparams while the wavfile.read() returns the sampling rate of 24000, resulting an assert. Changing this is no problem but I would like to know on what sampling rate, the pretrained model is trained. Thanks :D

rafaelvalle commented 4 years ago

We resampled LibriTTS to 22050Hz for training.