V2AI / Det3D

World's first general purpose 3D object detection codebse.
https://arxiv.org/abs/1908.09492
Apache License 2.0
1.5k stars 298 forks source link

Unable to train PointPillars #78

Closed countriccati closed 4 years ago

countriccati commented 4 years ago

Trying to train with either kitti_point_pillars_mghead_syncbn.py or nusc_all_point_pillars_mghead_syncbn.py on Kitti or nuscenes data, the process crashes with strange CUDA errors:

/pytorch/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda [](int)->auto::operator()(int)->auto: block: [278,0,0], thread: [127,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed.
...
File "home/Det3D/det3d/models/readers/pillar_encoder.py", line 192, in forward
    this_coords = coords[batch_mask, :]
RuntimeError: copy_if failed to synchronize: cudaErrorAssert: device-side assert triggered

(I am able to train SECOND using both Kitti config files and get good results, so it is something with the PointPillars implementation).

Debugging with PDB the tensors appear to be the correct size, not certain what the issue is.

Suggestions? Running on Ubuntu 18.04, pytorch 1.3, CUDA 10.1

dongqiaqia commented 4 years ago

The error is here: 199 canvas[:, indices] = voxels , which Indices is overflowed.
I solve this error as follows: home/Det3D/det3d/models/readers/pillar_encoder.py 176 self.nx = input_shape[0] -->self.nx = int(input_shape[0]) 177 self.ny = input_shape[1]-->self.ny = int(input_shape[0])