LittlePey / SFD

Sparse Fuse Dense: Towards High Quality 3D Detection with Depth Completion (CVPR 2022, Oral)
Apache License 2.0
262 stars 35 forks source link

RuntimeError: CUDA error: device-side assert triggered #54

Open betty-zeng opened 1 year ago

betty-zeng commented 1 year ago

Hello,

I'm training the code on docker using pytorch/pytorch:1.10.0-cuda11.3-cudnn8-devel as base image with python 3.8. Setup worked fine until I tried to train the code, then these error came out:

/opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [33,0,0], thread: [57,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [16,0,0], thread: [52,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [18,0,0], thread: [0,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [16,0,0], thread: [93,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [75,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [87,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [99,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [110,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [17,0,0], thread: [126,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/IndexKernel.cu:91: operator(): block: [18,0,0], thread: [92,0,0] Assertionindex >= -sizes[i] && index < sizes[i] && "index out of bounds"failed. epochs: 0%| | 0/6 [00:13<?, ?it/s] Traceback (most recent call last): File "Models/SFD/tools/train.py", line 212, in <module> main() File "Models/SFD/tools/train.py", line 167, in main train_model( File "/workspace/Models/SFD/tools/train_utils/train_utils.py", line 86, in train_model accumulated_iter = train_one_epoch( File "/workspace/Models/SFD/tools/train_utils/train_utils.py", line 38, in train_one_epoch loss, tb_dict, disp_dict = model_func(model, batch) File "/workspace/OpenPCDet/pcdet/models/__init__.py", line 44, in model_func ret_dict, tb_dict, disp_dict = model(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/workspace/Models/SFD/pcdet_extensions/models/detectors/sfd.py", line 11, in forward batch_dict = cur_module(batch_dict) File "/opt/conda/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/workspace/Models/SFD/pcdet_extensions/models/roi_heads/sfd_head.py", line 595, in forward self.roicrop3d_gpu(batch_dict, self.model_cfg.ROI_POINT_CROP.POOL_EXTRA_WIDTH) File "/workspace/Models/SFD/pcdet_extensions/models/roi_heads/sfd_head.py", line 554, in roicrop3d_gpu image[total_pts_features[:,7].long(), total_pts_features[:,6].long()] = global_index.to(device=total_pts_features.device) RuntimeError: CUDA error: device-side assert triggered

It seems like there is an index error from the function roicrop3d_gpu. Could you please verify this? I got stuck for a few days already. Thank you!

HuangLLL123 commented 7 months ago

hello,have you solved the problem? and how?

vacant-ztz commented 2 months ago

hello,have you solved the problem? and how?

HuangLLL123 commented 1 month ago

hello,have you solved the problem? and how?

I have solved the problem according to the method in this webpage https://github.com/LittlePey/SFD/issues/23. May I ask how you solved it? @vacant-ztz