NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 187 forks source link

Mismatch model volume #101

Open thkaang opened 2 years ago

thkaang commented 2 years ago

Hello, Thank you for sharing the mellotron code. I have a question about mismatch model volume between pre-trained model and the model trained by default code. The pre-trained models you uploaded (LibriTTS or LJS) are about 127MB respectively. But when I trained with the default setting (train.py and hparams.py), the checkpoint volume is about 382MB. Is there any settings that I missed ? (such as changing some layer dimensions etc...)