garrickbrazil / M3D-RPN

MIT License
261 stars 67 forks source link

CUDA out of memory error #50

Open chinmaydharmatti opened 3 years ago

chinmaydharmatti commented 3 years ago

Hi Garrick,

I am trying to replicate the results. When I execute the warmup script, I get a CUDA out of memory error. I know that reducing the batch size can help in avoiding this. The batch size is 2 which is relatively low. What should be done so as to avoid this error? You can find the error that I get after executing the script below.

chinmay@chinmay-Legion-Y540-15IRH:~/Desktop/M3D-RPN$ python scripts/train_rpn_3d.py --config=kitti_3d_multi_warmup Setting up a new session... Visdom successfully connected to server Preloading imdb. weighted respectively as 1.05 and 0.00 Found 3534 foreground and 178 empty images Labels not used in training.. ['DontCare', 'Truck', 'Tram', 'Misc', 'Person_sitting'] conf: { model: densenet121_3d_dilate
solver_type: sgd
lr: 0.004
momentum: 0.9
weight_decay: 0.0005
max_iter: 50000
snapshot_iter: 10000
display: 250
do_test: True
lr_policy: poly
lr_steps: None
lr_target: 4e-08
rng_seed: 2
cuda_seed: 2
image_means: [0.485, 0.456, 0.406]
image_stds: [0.229, 0.224, 0.225]
feat_stride: 16
has_3d: True
test_scale: 512
crop_size: [512, 1760]
mirror_prob: 0.5
distort_prob: -1
dataset_test: kitti_split1
datasets_train: [{'anno_fmt': 'kitti_det', 'im_ext': '.png', 'name': 'kitti_split1', 'scale': 1}] use_3d_for_2d: True
percent_anc_h: [0.0625, 0.75]
min_gt_h: 32.0
max_gt_h: 384.0
min_gt_vis: 0.65
ilbls: ['Van', 'ignore']
lbls: ['Car', 'Pedestrian', 'Cyclist']
batch_size: 2
fg_image_ratio: 1.0
box_samples: 0.2
fg_fraction: 0.2
bg_thresh_lo: 0
bg_thresh_hi: 0.5
fg_thresh: 0.5
ign_thresh: 0.5
best_thresh: 0.35
nms_topN_pre: 3000
nms_topN_post: 40
nms_thres: 0.4
clip_boxes: False
test_protocol: kitti
test_db: kitti
test_min_h: 0
min_det_scales: [0, 0]
cluster_anchors: 0
even_anchors: 0
expand_anchors: 0
anchors: [[-0.5, -8.5, 15.5, 23.5, 51.969, 0.531, 1.713, 1.025, -0.799], [-8.5, -8.5, 23.5, 23.5, 52.176, 1.618, 1.6, 3.811, -0.453], [-16.5, -8.5, 31.5, 23.5, 48.334, 1.644, 1.529, 3.966, 0.673], [-2.528, -12.555, 17.528, 27.555, 44.781, 0.534, 1.771, 0.971, 0.093], [-12.555, -12.555, 27.555, 27.555, 44.704, 1.599, 1.569, 3.814, -0.187], [-22.583, -12.555, 37.583, 27.555, 43.492, 1.621, 1.536, 3.91, 0.719], [-5.069, -17.638, 20.069, 32.638, 34.666, 0.561, 1.752, 0.967, -0.384], [-17.638, -17.638, 32.638, 32.638, 35.35, 1.567, 1.591, 3.81, -0.511], [-30.207, -17.638, 45.207, 32.638, 37.128, 1.602, 1.529, 3.904, 0.452], [-8.255, -24.01, 23.255, 39.01, 28.771, 0.613, 1.76, 0.98, 0.067], [-24.01, -24.01, 39.01, 39.01, 28.331, 1.543, 1.592, 3.66, -0.811], [-39.764, -24.01, 54.764, 39.01, 30.541, 1.626, 1.524, 3.908, 0.312], [-12.248, -31.996, 27.248, 46.996, 23.011, 0.606, 1.758, 0.996, 0.208], [-31.996, -31.996, 46.996, 46.996, 22.948, 1.51, 1.599, 3.419, -1.076], [-51.744, -31.996, 66.744, 46.996, 25.0, 1.628, 1.527, 3.917, 0.334], [-17.253, -42.006, 32.253, 57.006, 18.479, 0.601, 1.747, 1.007, 0.347], [-42.006, -42.006, 57.006, 57.006, 18.815, 1.487, 1.599, 3.337, -0.862], [-66.759, -42.006, 81.759, 57.006, 20.576, 1.623, 1.532, 3.942, 0.323], [-23.527, -54.553, 38.527, 69.553, 15.035, 0.625, 1.744, 0.917, 0.41], [-54.553, -54.553, 69.553, 69.553, 15.346, 1.29, 1.659, 3.083, -0.275], [-85.58, -54.553, 100.58, 69.553, 16.326, 1.613, 1.527, 3.934, 0.268], [-31.39, -70.281, 46.39, 85.281, 12.265, 0.631, 1.747, 0.954, 0.317], [-70.281, -70.281, 85.281, 85.281, 11.878, 1.044, 1.67, 2.415, -0.211], [-109.171, -70.281, 124.171, 85.281, 13.58, 1.621, 1.539, 3.961, 0.189], [-41.247, -89.994, 56.247, 104.994, 9.932, 0.61, 1.771, 0.934, 0.486], [-89.994, -89.994, 104.994, 104.994, 8.949, 0.811, 1.766, 1.662, 0.08], [-138.741, -89.994, 153.741, 104.994, 11.043, 1.61, 1.533, 3.899, 0.04], [-53.602, -114.704, 68.602, 129.704, 8.389, 0.604, 1.793, 0.95, 0.806], [-114.704, -114.704, 129.704, 129.704, 8.071, 1.01, 1.751, 2.19, -0.076], [-175.806, -114.704, 190.806, 129.704, 9.184, 1.606, 1.526, 3.869, -0.066], [-69.089, -145.677, 84.089, 160.677, 6.923, 0.627, 1.791, 0.96, 0.784], [-145.677, -145.677, 160.677, 160.677, 6.784, 1.384, 1.615, 2.862, -1.035], [-222.266, -145.677, 237.266, 160.677, 7.863, 1.617, 1.55, 3.948, -0.071], [-88.5, -184.5, 103.5, 199.5, 5.189, 0.66, 1.755, 0.841, 0.173], [-184.5, -184.5, 199.5, 199.5, 4.388, 0.743, 1.728, 1.381, 0.642], [-280.5, -184.5, 295.5, 199.5, 5.583, 1.583, 1.547, 3.862, -0.072]] bbox_means: [[-0.0, 0.002, 0.064, -0.093, 0.011, -0.067, 0.192, 0.059, -0.021, 0.069, -0.004]] bbox_stds: [[0.14, 0.126, 0.247, 0.239, 0.163, 0.132, 3.621, 0.382, 0.102, 0.503, 1.855]] anchor_scales: [32.0, 40.11, 50.276, 63.019, 78.991, 99.012, 124.106, 155.561, 194.989, 244.409, 306.354, 384.0] anchor_ratios: [0.5, 1.0, 1.5]
hard_negatives: True
focal_loss: 0
cls_2d_lambda: 1
iou_2d_lambda: 1
bbox_2d_lambda: 0
bbox_3d_lambda: 1
bbox_3d_proj_lambda: 0.0
hill_climbing: True
visdom_port: 8100
} Traceback (most recent call last): File "scripts/train_rpn_3d.py", line 196, in main(sys.argv[1:]) File "scripts/train_rpn_3d.py", line 122, in main cls, prob, bbox_2d, bbox_3d, feat_size = rpn_net(images) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 159, in forward return self.module(*inputs[0], *kwargs[0]) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/chinmay/Desktop/M3D-RPN/output/kitti_3d_multi_warmup/densenet121_3d_dilate.py", line 83, in forward x = self.base(x) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, kwargs) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/container.py", line 117, in forward input = module(input) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, *kwargs) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torchvision/models/densenet.py", line 111, in forward new_features = layer(features) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(input, kwargs) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torchvision/models/densenet.py", line 84, in forward bottleneck_output = self.bn_function(prev_features) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torchvision/models/densenet.py", line 41, in bn_function bottleneck_output = self.conv1(self.relu1(self.norm1(concated_features))) # noqa: T484 File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl result = self.forward(*input, **kwargs) File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/modules/batchnorm.py", line 131, in forward return F.batch_norm( File "/home/chinmay/anaconda3/lib/python3.8/site-packages/torch/nn/functional.py", line 2056, in batch_norm return torch.batch_norm( RuntimeError: CUDA out of memory. Tried to allocate 22.00 MiB (GPU 0; 5.79 GiB total capacity; 4.60 GiB already allocated; 3.81 MiB free; 4.72 GiB reserved in total by PyTorch)

qaazii commented 2 years ago

Change you batch size