Closed 111surajmaurya closed 4 years ago
Looks like the above error is because of version discrepancy of pytorch. The code is written in pytorch version 0.4.1 and i was using 1.4.0. But after installing pytorch 0.4.1 on cuda 9 (and adding os.environ["CUDA_VISIBLE_DEVICES"]="0" ) I was able to train the model.
Hi I am getting this error while training. I am following the exact steps mentioned for training, i am able to perform inference but not training.
cmmd- python scripts/train_rpn_3d.py --config=kitti_3d_multi_warmup
File "scripts/train_rpn_3d.py", line 198, in
main(sys.argv[1:])
File "scripts/train_rpn_3d.py", line 124, in main
cls, prob, bbox_2d, bbox_3d, feat_size = rpn_net(images)
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, *kwargs)
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 153, in forward
return self.gather(outputs, self.output_device)
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 165, in gather
return gather(outputs, output_device, dim=self.dim)
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 68, in gather
res = gather_map(outputs)
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(outputs)))
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(outputs)))
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/parallel/scatter_gather.py", line 63, in gather_map
return type(out)(map(gather_map, zip(outputs)))
TypeError: zip argument #1 must support iteration
Here is configuration of system and packaged Ubuntu- Ubuntu 18.04.3 LTS Cuda- 10.2 CuDNN - 7.6.5 torch - 1.4.0 python - 3.7.3
If i add os.environ["CUDA_VISIBLE_DEVICES"]="0" in training file (train_rpn_3d.py), then i don't get the above error but new error in next line. i.e.
Traceback (most recent call last): File "scripts/train_rpn_3d.py", line 198, in
main(sys.argv[1:])
File "scripts/train_rpn_3d.py", line 127, in main
det_loss, det_stats = criterion_det(cls, prob, bbox_2d, bbox_3d, imobjs, feat_size)
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call
result = self.forward(*input, **kwargs)
File "/workspace/M3D-RPN/lib/loss/rpn_3d.py", line 125, in forward
src_anchors = self.anchors[rois[:, 4].type(torch.cuda.LongTensor), :]
File "/root/utils/anaconda3/lib/python3.7/site-packages/torch/tensor.py", line 486, in array
return self.numpy()
TypeError: can't convert CUDA tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.
Please let me know if this is because of some version issue or code error.
Thanks in advance.