clinplayer / Point2Skeleton

Point2Skeleton: Learning Skeletal Representations from Point Clouds (CVPR2021)
MIT License
206 stars 38 forks source link

"How to resolve CUDA error: device-side assert triggered" #24

Open shuaigeGoku opened 1 year ago

shuaigeGoku commented 1 year ago

Issue Description: I encountered the following error while attempting GAE training, and I'm unsure how to resolve it. I've tried multiple approaches, but none have been successful. Please help me find a solution. Error Message:# Copy and paste the complete error message here C:/w/1/s/windows/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:53: block: [10,0,0], thread: [32,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed. Traceback (most recent call last): File "train.py", line 181, in batch_pc, compute_graph=True) File "C:\Users\YourUsername\AppData\Local\Programs\Python\Python37\lib\site-packages\torch\nn\modules\module.py", line 493, in call result = self.forward(*input, **kwargs) File "D:\Point2Skeleton\Point2Skeleton\src\SkelPointNet.py", line 238, in forward A, valid_Mask, known_Mask = self.init_graph(input_pc[..., 0:3], skel_xyz) File "D:\Point2Skeleton\Point2Skeleton\src\SkelPointNet.py", line 188, in init_graph A[torch.arange(bn)[:, None], knn_sp2sk[:, :, 1], knn_sp2sk[:, :, 0]] = 1 RuntimeError: CUDA error: device-side assert triggered

Environment Information: Operating System: Windows 11 Python Version: Python 3.7.8 PyTorch Version: (1.1.0) CUDA Version: (10.0.13) Issue Background: I'm attempting to perform GAE training (please provide the relevant project or library name, if applicable). My objective is (briefly describe your goal or task).

clinplayer commented 1 year ago

Hey, I can't reimplement this issue. I think this error is caused by using an index that is outside the range of dimensions of a CUDA tensor in CUDA tensor indexing. Specifically, this error message is a device-side assertion in the CUDA runtime library, used to check if the index used in tensor indexing is within the range of tensor size.

Could you please check if you amended the code so the indices are out of bounds of the shape of the adjacency matrix A? I would recommend printing the shapes of the indices of each dimension to find the cause of error.