Open leenamx opened 7 months ago
Hi, I encountered error when training s3dis dataset.
I don't know how to reslove it. I would be very appreciated if you can help me!
[04/11 08:36:36 main-logger]: #Model parameters: 8022970 [04/11 08:36:42 main-logger]: augmentation all [04/11 08:36:42 main-logger]: jitter_sigma: 0.005, jitter_clip: 0.02 Totally 204 samples in train set. [04/11 08:36:42 main-logger]: train_data samples: '6120' Totally 67 samples in val set. [04/11 08:36:42 main-logger]: scheduler: MultiStep. scheduler_update: epoch. milestones: [60, 80], gamma: 0.1 [04/11 08:36:42 main-logger]: lr: [0.006, 0.0006000000000000001] WARNING [04/11 08:36:48 main-logger]: batch_size shortened from 2 to 1, points from 157383 to 80000 WARNING [04/11 08:36:48 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:49 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:49 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:50 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:50 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:52 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:55 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:55 main-logger]: batch_size shortened from 2 to 1, points from 143502 to 80000 WARNING [04/11 08:36:55 main-logger]: batch_size shortened from 2 to 1, points from 149464 to 80000 WARNING [04/11 08:36:56 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 145628 to 65628 WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 154192 to 74192 /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [38,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [39,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [40,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [41,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [42,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [43,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [50,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [51,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [52,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [56,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [57,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [58,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [62,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [63,0,0] Assertion idx_dim >= 0 && idx_dim < index_size && "index out of bounds" failed. WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:58 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from createEvent at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/cuda/CUDAEvent.h:174 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f1f8ffa21bd in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: + 0xaca8ba (0x7f1d81d5c8ba in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so) frame #2: + 0x2ecb98 (0x7f1dc696fb98 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #3: c10::TensorImpl::release_resources() + 0x175 (0x7f1f8ff88fb5 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so) frame #4: + 0x1db509 (0x7f1dc685e509 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #5: + 0x4c634c (0x7f1dc6b4934c in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so) frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f1dc6b49652 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
Hi, I encountered error when training s3dis dataset.
I don't know how to reslove it. I would be very appreciated if you can help me!
[04/11 08:36:36 main-logger]: #Model parameters: 8022970 [04/11 08:36:42 main-logger]: augmentation all [04/11 08:36:42 main-logger]: jitter_sigma: 0.005, jitter_clip: 0.02 Totally 204 samples in train set. [04/11 08:36:42 main-logger]: train_data samples: '6120' Totally 67 samples in val set. [04/11 08:36:42 main-logger]: scheduler: MultiStep. scheduler_update: epoch. milestones: [60, 80], gamma: 0.1 [04/11 08:36:42 main-logger]: lr: [0.006, 0.0006000000000000001] WARNING [04/11 08:36:48 main-logger]: batch_size shortened from 2 to 1, points from 157383 to 80000 WARNING [04/11 08:36:48 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:49 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:49 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:50 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:50 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:52 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:55 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:55 main-logger]: batch_size shortened from 2 to 1, points from 143502 to 80000 WARNING [04/11 08:36:55 main-logger]: batch_size shortened from 2 to 1, points from 149464 to 80000 WARNING [04/11 08:36:56 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 145628 to 65628 WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 154192 to 74192 /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [38,0,0] Assertion + 0xaca8ba (0x7f1d81d5c8ba in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_cuda_cpp.so)
frame #2: + 0x2ecb98 (0x7f1dc696fb98 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #3: c10::TensorImpl::release_resources() + 0x175 (0x7f1f8ff88fb5 in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #4: + 0x1db509 (0x7f1dc685e509 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #5: + 0x4c634c (0x7f1dc6b4934c in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #6: THPVariable_subclass_dealloc(_object*) + 0x292 (0x7f1dc6b49652 in /opt/conda/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
idx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [39,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [40,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [41,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [42,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [43,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [50,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [51,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [52,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [56,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [57,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [58,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [62,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/native/cuda/ScatterGatherKernel.cu:111: operator(): block: [131,0,0], thread: [63,0,0] Assertionidx_dim >= 0 && idx_dim < index_size && "index out of bounds"
failed. WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:57 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 WARNING [04/11 08:36:58 main-logger]: batch_size shortened from 2 to 1, points from 160000 to 80000 terminate called after throwing an instance of 'c10::CUDAError' what(): CUDA error: device-side assert triggered CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Exception raised from createEvent at /opt/conda/conda-bld/pytorch_1646755903507/work/aten/src/ATen/cuda/CUDAEvent.h:174 (most recent call first): frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x4d (0x7f1f8ffa21bd in /opt/conda/lib/python3.8/site-packages/torch/lib/libc10.so) frame #1: