ethnhe / PVN3D

Code for "PVN3D: A Deep Point-wise 3D Keypoints Hough Voting Network for 6DoF Pose Estimation", CVPR 2020
MIT License
482 stars 105 forks source link

Training on custom dataset failed, giving the following traceback #96

Closed akber871 closed 2 years ago

akber871 commented 2 years ago

Hello,

I am trying to train the network for my custom dataset containing 8 objects in a cluttered scene. As per the instruction, I generated the kps, config and other files before running the training script - train_gears_pvn3d.py

0,0,0], thread: [199,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [110,0,0], thread: [200,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [110,0,0], thread: [201,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [110,0,0], thread: [220,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. /pytorch/aten/src/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [110,0,0], thread: [221,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed. epochs: 0%| | 0/25 [00:49<?, ?it/s] Traceback (most recent call last):
File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/projappl/project_2003042/PVN3D/pvn3d/train/train_gears_pvn3d.py", line 533, in best_loss=best_loss File "/projappl/project_2003042/PVN3D/pvn3d/train/train_gearspvn3d.py", line 363, in train , loss, res = self.model_fn(self.model, batch) File "/projappl/project_2003042/PVN3D/pvn3d/train/train_gears_pvn3d.py", line 178, in model_fn labels.view(-1) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(*input, kwargs) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 143, in forward outputs = self.parallel_apply(replicas, inputs, kwargs) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/parallel/data_parallel.py", line 153, in parallel_apply return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)]) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 83, in parallel_apply raise output File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/parallel/parallel_apply.py", line 59, in _worker output = module(*input, *kwargs) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 489, in call result = self.forward(input, kwargs) File "/projappl/project_2003042/PVN3D/pvn3d/lib/loss.py", line 29, in forward logpt = F.log_softmax(input) File "/projappl/project_2003042/miniconda3/envs/pvn3d/lib/python3.6/site-packages/torch/nn/functional.py", line 1295, in log_softmax ret = input.log_softmax(dim) RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/aten/src/ATen/native/cuda/SoftMax.cu:545