Closed czy341181 closed 4 years ago
Hi, Thanks for your interest. As mentioned in README, the model requires 32G memory for training even for 1 batch per GPU because 3D Conv has a high demand of memory.
Since most people cannot get the device with enough GPU memory, I have tried several other settings for 24G memory and 12G memory. Training with 24G memory does not hurt the performance on the validation set but training with 12G memory hurt the performance a lot (- 10 AP). I will recently later upload the setting. Thanks!
thanks for your timely reply! Looking forward to your upload. By the way, Can you provide your pre-training model? I think my memory can be used for testing
You can find two pre-trained models for car and human in README :)
thanks! I have a problem. 3D convolution often exists in the network of binocular disparity estimation, such as PSMnet and GWCnet, but my 12G memory is enough for those. I want to know what makes DSGN need so much memory.
3D volume costs a lot of GPU memory actually. Because plane sweep volume requires to store [Batch, H, W, D, F_dim], where initially H~=1284 / 4, W ~= 375 / 4, D~=192 / 4, F_dim=64. You'll find that it is approximately several times of ResNet-50 in 2D images with size (800, 1200).
Hello,
we are facing currently the same problem, even though we are provided 32 Gb of GPU Memory ( 4 GPUs, 7618 Mb each).
We run the training script as follows:
python tools/train_net.py --cfg ./configs/default/config_car.py --savemodel ./outputs/dsgn_car -btrain 4 -d 0-3 --multiprocessing-distributed
Do you know on what kind of issues this can lie? also can you please share the settings you have used to match 24 Gb?
Thanks in advance.
Hi, Thanks for your interest in this work. Here I mean each GPU should have 32GB memory because the stereo matching costs a lot to store the intermediate volume.
I will update the settings these two days. Hope it can help you a bit.
Hi, thanks for your work. I input batchsize 1 for one gpu, but out of memory? (caused by 3D convolution)
python3 ./tools/train_net.py \ --cfg ./configs/default/config_car.py \ --savemodel ./outputs/dsgn_car \ --start_epoch 1 \ --lr_scale 50 \ --epochs 60 \ -btrain 1 \ -d 0 \
What should I do to solve this problem? Thanks for you help.