NVlabs / Deep_Object_Pose

Deep Object Pose Estimation (DOPE) – ROS inference (CoRL 2018)
Other
1.02k stars 287 forks source link

When using --datatest (test dataset), GPU runs out of memory. #75

Closed Abdul-Mukit closed 5 years ago

Abdul-Mukit commented 5 years ago

I am training on about 12k images, and I have 2000 test images. I am using a batch-size of 32 as I have 2 GPUs (total 24Gb). When using the train.py with no testdata, training runs fine. But when testdata is mentioned, I get " cuda runtime error (2) : out of memory " error. What should I do?

TontonTremblay commented 5 years ago

Hey Mukit, I think the problem comes from a test batchsize that is too high. You could augment the script to have a different batchsize for testing. Does that make sense?

Abdul-Mukit commented 5 years ago

Hi again @TontonTremblay Thank you for your reply. Yes, it makes sense. I'll try that out and let you know how it goes. Thanks again.

Abdul-Mukit commented 5 years ago

Hi, @TontonTremblay I set the test batch size to 16 and it's working fine now. Thanks. The opt.batchsize was used for both testing and training batch size.

However, I didn't really understand why a batch size of 32 would work for training but not for testing. I would need a smaller batch size for testing. Training is more computationally demanding right, due to the backpropagation and optimization? Is it because of the GPU stores some values from the training stage when it diverges to go through the testing phase? I mean the GPUs already has some memory in reserve for the training calculations that's why it doesn't have space for 32 batch size? Would really appreciate if you could help me understand this. Thank you, again.