NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 184 forks source link

long dataset need more video memory? #21

Closed daxiangpanda closed 4 years ago

daxiangpanda commented 4 years ago

train this model with a longer dataset(10s to 20s per audio).I noticed that batch size is smaller than when I trained with LJSpeech dataset.

rafaelvalle commented 4 years ago

You can use longer files but you will need to decrease the batch size. For each batch, the data loader zero pads the data in that batch to the length of the longest sample.

daxiangpanda commented 4 years ago

I train this model with a p40 GPU with my own dataset.

rafaelvalle commented 4 years ago

Adjust the batch size to fit your memory availability.

daxiangpanda commented 4 years ago

thx