CUDA out of memory - Githubissues

dvlab-research / DSGN

DSGN: Deep Stereo Geometry Network for 3D Object Detection (CVPR 2020)

MIT License

325 stars 50 forks source link

CUDA out of memory #8

Closed czy341181 closed 4 years ago

czy341181 commented 4 years ago

Hi, thanks for your work. I input batchsize 1 for one gpu, but out of memory？ (caused by 3D convolution)

python3 ./tools/train_net.py \ --cfg ./configs/default/config_car.py \ --savemodel ./outputs/dsgn_car \ --start_epoch 1 \ --lr_scale 50 \ --epochs 60 \ -btrain 1 \ -d 0 \

What should I do to solve this problem? Thanks for you help.

chenyilun95 commented 4 years ago

Hi, Thanks for your interest. As mentioned in README, the model requires 32G memory for training even for 1 batch per GPU because 3D Conv has a high demand of memory.

Since most people cannot get the device with enough GPU memory, I have tried several other settings for 24G memory and 12G memory. Training with 24G memory does not hurt the performance on the validation set but training with 12G memory hurt the performance a lot (- 10 AP). I will recently later upload the setting. Thanks!

czy341181 commented 4 years ago

thanks for your timely reply! Looking forward to your upload. By the way, Can you provide your pre-training model? I think my memory can be used for testing

chenyilun95 commented 4 years ago

You can find two pre-trained models for car and human in README :)

czy341181 commented 4 years ago

thanks! I have a problem. 3D convolution often exists in the network of binocular disparity estimation, such as PSMnet and GWCnet, but my 12G memory is enough for those. I want to know what makes DSGN need so much memory.

chenyilun95 commented 4 years ago

3D volume costs a lot of GPU memory actually. Because plane sweep volume requires to store [Batch, H, W, D, F_dim], where initially H~=1284 / 4, W ~= 375 / 4, D~=192 / 4, F_dim=64. You'll find that it is approximately several times of ResNet-50 in 2D images with size (800, 1200).

vbmt-net commented 4 years ago

Hello,

we are facing currently the same problem, even though we are provided 32 Gb of GPU Memory ( 4 GPUs, 7618 Mb each). We run the training script as follows: python tools/train_net.py --cfg ./configs/default/config_car.py --savemodel ./outputs/dsgn_car -btrain 4 -d 0-3 --multiprocessing-distributed

Do you know on what kind of issues this can lie? also can you please share the settings you have used to match 24 Gb?

Thanks in advance.

chenyilun95 commented 4 years ago

Hi, Thanks for your interest in this work. Here I mean each GPU should have 32GB memory because the stereo matching costs a lot to store the intermediate volume.

I will update the settings these two days. Hope it can help you a bit.