V2AI / Det3D

World's first general purpose 3D object detection codebse.
https://arxiv.org/abs/1908.09492
Apache License 2.0
1.5k stars 298 forks source link

Lyft vs. nuScence #100

Closed jinglin80 closed 4 years ago

jinglin80 commented 4 years ago

Initially, we tried nuScence dataset, it is working and training though slowly. Since we only have 2 gpus per node, we try to train a smaller dataset, Lyft dataset. Try to follow through "Get started" (https://github.com/poodarchu/Det3D/blob/master/GETTING_STARTED.md) for Lyft dataset. However, we get the cuda runtime error as shown below. We wonder why there is runtime issue for lyft dataset but no with nuScence dataset that is about 10 times larger.

For more information visit http://numba.pydata.org/numba-doc/latest/reference/deprecation.html#deprecation-of-object-mode-fall-back-behaviour-when-using-jit

File "../../../../usr/local/Det3D/det3d/core/bbox/geometry.py", line 290: def points_in_convex_polygon_jit(points, polygon, clockwise=True):

# first convert polygon to directed lines
num_points_of_polygon = polygon.shape[1]
^

state.func_ir.loc)) Traceback (most recent call last): File "/shares/xionggroup/Det3D/tools/train.py", line 133, in main() File "/shares/xionggroup/Det3D/tools/train.py", line 128, in main logger=logger, File "/usr/local/Det3D/det3d/torchie/apis/train.py", line 343, in train_detector trainer.run(data_loaders, cfg.workflow, cfg.total_epochs, local_rank=cfg.local_rank) File "/usr/local/Det3D/det3d/torchie/trainer/trainer.py", line 536, in run epoch_runner(data_loaders[i], self.epoch, kwargs) File "/usr/local/Det3D/det3d/torchie/trainer/trainer.py", line 403, in train self.model, data_batch, train_mode=True, kwargs File "/usr/local/Det3D/det3d/torchie/trainer/trainer.py", line 362, in batch_processor_inline losses = model(example, return_loss=True) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, kwargs) File "/usr/local/lib/python3.6/dist-packages/torch/nn/parallel/distributed.py", line 442, in forward output = self.module(*inputs[0], *kwargs[0]) File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(input, kwargs) File "/usr/local/Det3D/det3d/models/detectors/voxelnet.py", line 46, in forward x = self.extract_feat(data) File "/usr/local/Det3D/det3d/models/detectors/voxelnet.py", line 24, in extract_feat input_features, data["coors"], data["batch_size"], data["input_shape"] File "/usr/local/lib/python3.6/dist-packages/torch/nn/modules/module.py", line 541, in call result = self.forward(*input, *kwargs) File "/usr/local/Det3D/det3d/models/backbones/scn.py", line 365, in forward ret = ret.dense() File "/usr/local/lib/python3.6/dist-packages/spconv/init.py", line 83, in dense return res.permute(trans_params).contiguous() RuntimeError: CUDA out of memory. Tried to allocate 374.00 MiB (GPU 1; 7.93 GiB total capacity; 4.65 GiB already allocated; 294.56 MiB free; 133.55 MiB cached)

poodarchu commented 4 years ago

It's OOM(Out Of Memory) error, so decrease batch size and try again.