Closed suhaspillai closed 7 years ago
Hey! try turning down the batch size from the default, how much memory do you have on your GPU? You can turn it down via:
th Train.lua -batchSize xx
where xx = the batch size you want!
Yes, turn down your batch size and also @shantanudev suggested lower the number of hidden nodes as well.
@SeanNaren I had tried with reducing the batch size but it did not work . I think there was some issue with the GPU, because its working now. Thanks for the quick reply
Still getting the same error. The memory usage seems to be increasing always.
What GPU are you running this on @iassael?
@SeanNaren I'm on Tesla GP100. I added collectgarbage() every 100 batches a little after optim.sgd but the problem persists. Do you have any suggestions of where to look at?
Excited to see my stuff running on a P100 :D
What batch size are you using? Have you made it through the first epoch? My assumption is you need to turn the batch size lower using the -batchSize
flag, since some of the sequences towards the end of the epoch are fairly long thus take more memory.
I met problem out of memory when I use two GPUs. During the second epoch, the usage of memory still continues to go up. Do you have any idea?
@SeanNaren hehe it's the perfect benchmark for these babies :)
My BS is 20, but I make it nearly till the end of the first epoch, as the memory usage increases many folds during the first epoch.
@fanlamda could you try the latest master branch? I've made it an option to permute batches via -permuteBatch
in training which defaults to false. I've noticed huge increases in memory due to permuting the batching order for all batches after the first epoch.
@iassael could you try reducing this to like 15 and seeing if you get through? Also I've just started the setup on my end to start training this model on my own server and I've got some advise for peeps. The current set up for the architecture is way overkill, I'd suggest using training params similar to below:
th Train.lua -hiddenSize 600 -LSTM -nbOfHiddenLayers 5
This is a much smaller model (closer to what was the number of params in the DS2 architecture, but it uses LSTM and no weight sharing between the RNNs thus isn't as good) but our dataset is also much smaller. Hopefully all this helps!
@SeanNaren you are right, I tried pulling from the latest branch and switching to a smaller architecture, but the memory is constantly increasing throughout the iterations of the first epoch. So, I thought that it could be the warp_ctc implementation so I switched to nGPU 1 and time-first, still without success. Do you have any intuition of where to look at?
The GPU memory will always increase as we move through the epoch as the batches get larger and larger (and those RNNs take a lot of memory!). Largest batch per GPU which has 12gb of vram is usually around 15 from what I've seen because of this.
I know Baidu were able to get much larger batches but they had their own internal software and were super memory efficient when it came to how to using memory. Sadly with Torch it's a little bit more difficult, but should still be trainable with smaller batch sizes! Hopefully this help!
It works @SeanNaren
If anyone still has issues feel free to open a new issue :)
I am trying to train the model on libri speech dev-clean dataset, where my train split = 2503 and val split = 200. I reduced my val split thinking this might be the issue . Based on the memory consumption (which I checked using nvidia-smi), I think all the training data is loaded at once and so is the validation right? . Did anyone face this issue? Following is the stack trace