LiyaoTang / contrastBoundary

Contrastive Boundary Learning for Point Cloud Segmentation (CVPR2022)
MIT License
140 stars 11 forks source link

RuntimeError: CUDA error: device-side assert triggered #15

Closed ramdrop closed 2 years ago

ramdrop commented 2 years ago

Hi, thanks for open sourcing your wonderful work, but I encounterd a few errors when trying to reproduce you results. Could you please help debug this issue?

Environment Ubuntu 18.04, PyTorch 1.9.0, CUDA 11.1, A100. I installed a conda environment by following point-transformer.

Launch run: bash tool/train.sh s3dis origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1

Config Since I don't have 4 gpu, I modifed the gpu, batch_size and works parameters in pytorch/config/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1.yaml accordingly:

  train_gpu: [0]
  workers: 4  # data loader workers
  batch_size: 4  # batch size for training

Outputs

/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [147,0,0], thread: [96,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [147,0,0], thread: [97,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [147,0,0], thread: [98,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/LOCAL2/ramdrop/apps/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/functional.py:652: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.)
  return torch.max_pool1d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 452, in <module>
    main()
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 128, in main
    main_worker(args.train_gpu, args.ngpus_per_node, args)
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 261, in main_worker
    loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch)
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 323, in train
    loss = criterion(output, target, stage_list)
  File "/LOCAL2/ramdrop/apps/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/pointtransformer_seg.py", line 24, in forward
    loss_list += self.contrast_head(output, target, stage_list)
  File "/LOCAL2/ramdrop/apps/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/heads.py", line 251, in forward
    loss = self.main_contrast(n, i, stage_list, target)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/heads.py", line 222, in point_contrast
    if not torch.any(point_mask):
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

full log: train-20220530_205425.log

LiyaoTang commented 2 years ago

Hi, thanks for your interest.

Have you tried running the program with debug flag, or the CUDA_LAUNCH_BLOCKING=1 as suggested?

ramdrop commented 2 years ago

running with debug flag&CUDA_LAUNCH_BLOCKING=1 seems to give a more detailed log:

/opt/conda/conda-bld/pytorch_1623448265233/work/aten/src/ATen/native/cuda/IndexKernel.cu:97: operator(): block: [252,0,0], thread: [91,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
/LOCAL2/ramdrop/apps/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/functional.py:652: UserWarning: Named tensors and all their associated APIs are an experimental feature and subject to change. Please do not use them for anything important until they are released as stable. (Triggered internally at  /opt/conda/conda-bld/pytorch_1623448265233/work/c10/core/TensorImpl.h:1156.)
  return torch.max_pool1d(input, kernel_size, stride, padding, dilation, ceil_mode)
Traceback (most recent call last):
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 452, in <module>
    main()
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 128, in main
    main_worker(args.train_gpu, args.ngpus_per_node, args)
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 261, in main_worker
    loss_train, mIoU_train, mAcc_train, allAcc_train = train(train_loader, model, criterion, optimizer, epoch)
  File "exp/s3dis/origin_multi-Ua-concat-latent_contrast-Ua-softnn-latent-label-l2-w.1/train.py", line 323, in train
    loss = criterion(output, target, stage_list)
  File "/LOCAL2/ramdrop/apps/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/pointtransformer_seg.py", line 24, in forward
    loss_list += self.contrast_head(output, target, stage_list)
  File "/LOCAL2/ramdrop/apps/anaconda3/envs/pt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/heads.py", line 251, in forward
    loss = self.main_contrast(n, i, stage_list, target)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/heads.py", line 191, in point_contrast
    labels = get_subscene_label(n, i, stage_list, target, self.nstride, self.config.num_classes)  # (m, ncls) - distribution / onehot
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/basic_operators.py", line 14, in get_subscene_label
    return get_subscene_features(stage_n, stage_i, stage_list, x, nstride, **kwargs)
  File "/LOCAL2/ramdrop/github/point_registration/contrastBoundary/pytorch/model/basic_operators.py", line 42, in get_subscene_features
    x = x[neighbor_idx, :].view(p_to.shape[0], kr, x.shape[1]) # (m, kr, ncls)
RuntimeError: CUDA error: device-side assert triggered
ramdrop commented 2 years ago

Ahh, I found the problem, I created the conda environment and compiled pointops in point-transformer repo. But after I compiled the pointops in your repo, I found the training works.

Thanks for your quick reply!

LiyaoTang commented 2 years ago

Cheers.