NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
854 stars 187 forks source link

hparams trainings settings #26

Closed peter1000 closed 4 years ago

peter1000 commented 4 years ago

Hi,

refering to your paper:

4.1. Training Setup
 For all the experiments, we trained on LJS, Sally and the
train-clean-100 subset of LibriTTS with over 100 speakers
and 25 minutes on average per speaker.

I'd be interested in knowing if you use the same training settings for all 3 (LJS, Sally and the train-clean-100 subset of LibriTTS)

did you just change the training_files, validation_files path or many of the other parameters too?

https://github.com/NVIDIA/mellotron/blob/master/hparams.py#L5

THANKS

rafaelvalle commented 4 years ago

In addition to training and validation files, we changed the number of speakers: https://github.com/NVIDIA/mellotron/blob/master/hparams.py#L88

peter1000 commented 4 years ago

Thanks very kind of you taking the time to reply