Pointcept / PointTransformerV2

[NeurIPS'22] An official PyTorch implementation of PTv2.
357 stars 25 forks source link

CUDA error: an illegal memory access was encountered #12

Closed WeichengDai1 closed 1 year ago

WeichengDai1 commented 1 year ago

Hi, I have encountered the following error while playing with the PointTransformerV2 in pcr.models.point_transformer2.point_transformer_v2m2_base. Basically I set coord = torch.randn(36500, 3) feat = torch.randn(36500, 4) offset = torch.ones(4) and model = PointTransformerV2(in_channels=4, num_classes=4) data_dict = {'coord': coord, 'feat': feat, 'offset': offset} res = model(data_dict) Then there is error regarding the pointops/query.py which shows like this: Traceback (most recent call last): File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m1_origin.py", line 572, in <module> res = model(data_dict) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m1_origin.py", line 549, in forward points = self.patch_embed(points) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m1_origin.py", line 443, in forward return self.blocks([coord, feat, offset]) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m1_origin.py", line 239, in forward reference_index, _ = pointops.knn_query(self.neighbours, coord, offset) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/pointops-1.0-py3.8-linux-x86_64.egg/pointops/query.py", line 22, in forward return idx, torch.sqrt(dist2) RuntimeError: CUDA error: an illegal memory access was encountered

I tried to print the idx but it shows similar error: Traceback (most recent call last): File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/runpy.py", line 194, in _run_module_as_main return _run_code(code, main_globals, None, File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/runpy.py", line 87, in _run_code exec(code, run_globals) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m2_base.py", line 545, in <module> res = model(data_dict) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m2_base.py", line 522, in forward points = self.patch_embed(points) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m2_base.py", line 416, in forward return self.blocks([coord, feat, offset]) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl return forward_call(*input, **kwargs) File "/home/weicheng/selfLearning/PointTransformerV2/pcr/models/point_transformer2/point_transformer_v2m2_base.py", line 212, in forward reference_index, _ = pointops.knn_query(self.neighbours, coord, offset) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/pointops-1.0-py3.8-linux-x86_64.egg/pointops/query.py", line 22, in forward print(idx) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor.py", line 249, in __repr__ return torch._tensor_str._str(self) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor_str.py", line 415, in _str return _str_intern(self) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor_str.py", line 390, in _str_intern tensor_str = _tensor_str(self, indent) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor_str.py", line 251, in _tensor_str formatter = _Formatter(get_summarized_data(self) if summarize else self) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor_str.py", line 283, in get_summarized_data return torch.stack([get_summarized_data(x) for x in (start + end)]) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor_str.py", line 283, in <listcomp> return torch.stack([get_summarized_data(x) for x in (start + end)]) File "/home/weicheng/anaconda3/envs/pcr/lib/python3.8/site-packages/torch/_tensor_str.py", line 276, in get_summarized_data return torch.cat((self[:PRINT_OPTS.edgeitems], self[-PRINT_OPTS.edgeitems:])) RuntimeError: CUDA error: an illegal memory access was encountered

I am using an A6000 GPU and it has no other jobs running at that time. And I followed the exact instructions on installing torch, pointops and other packages. Could you please kindly tell me what is going wrong here? Thank you!

Sincerely,

Gofinge commented 1 year ago

Hi, please refer illustration of offset [here]. In your case, you can set offset = torch.tensor([36500]). Kindly remind, don't forget to send these tensors to CUDA device.

WeichengDai1 commented 1 year ago

Yes, it works! I set offset = torch.tensor([36500]) and put everything .cuda(), then the code ran with no error. Thank you!