NVIDIA / mellotron

Mellotron: a multispeaker voice synthesis model based on Tacotron 2 GST that can make a voice emote and sing without emotive or singing training data
BSD 3-Clause "New" or "Revised" License
853 stars 184 forks source link

RuntimeError: CUDA out of memory #24

Closed texpomru13 closed 4 years ago

texpomru13 commented 4 years ago

I'm trying to train mellotron on my dataset (20 speakers, each for 5 hours, samples up to 10sec, 22kHz, Russian language) and I just can’t pick up the batch size on V100 the maximum batch is 11-12, on K80 6-7, is this ok or am I doing something wrong?

RuntimeError: CUDA out of memory. Tried to allocate 2.00 MiB (GPU 15; 11.17 GiB total capacity; 10.81 GiB already allocated; 64.00 KiB free; 53.75 MiB cached)

thanks for the great work!

daxiangpanda commented 4 years ago

the length of your wav file is too large? when i train my language ,the longer the wav file is,the more video memory the model will take。 so i suppose you are ok

texpomru13 commented 4 years ago

@daxiangpanda but the maximum file length is 10 seconds. Is it too large?

texpomru13 commented 4 years ago

@daxiangpanda What batch size do you use?

rafaelvalle commented 4 years ago

Are you using the default hparams? Can you confirm that your files are at most exactly 10 seconds long? If you don't find any longer files and you're using the default hparams, just decrease the batch size accordingly.

For causes to the smaller batch zize I can think of larger text lengths due to a faster speech rate or some memory leak due to some pytorch version.

texpomru13 commented 4 years ago

@rafaelvalle you're right the problem was in several text files, thanks