SamsungLabs / fcaf3d

[ECCV2022] FCAF3D: Fully Convolutional Anchor-Free 3D Object Detection
MIT License
231 stars 37 forks source link

Training with Custom Dataset #49

Closed Shromm-Gaind closed 1 year ago

Shromm-Gaind commented 1 year ago

Prerequisites

Task I am trying to train fcaf3d with a custom dataset that emulates the sunrgbd dataset.

Environment Docker Image Build from DockerFile. GPU: 2080ti

Steps and Issues To emulate the SUNRGB-D Dataset:

Reproduces the problem - Command or script python tools/train.py configs/fcaf3d/fcaf3d_sunrgbd-3d-10class.py

Reproduces the problem - issue on 2080 ti issue as log: 2080ti_train_issue.txt

  File "/mmdetection3d/mmdet3d/models/backbones/me_resnet.py", line 89, in forward
    x = self.layer2(x)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/container.py", line 119, in forward
    input = module(input)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/modules/resnet_block.py", line 59, in forward
    out = self.conv2(out)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 321, in forward
    input._manager,
  File "/opt/conda/lib/python3.7/site-packages/MinkowskiEngine/MinkowskiConvolution.py", line 84, in forward
    coordinate_manager._manager,
RuntimeError: CUDA out of memory. Tried to allocate 392.00 MiB (GPU 0; 10.75 GiB total capacity; 8.93 GiB already allocated; 201.06 MiB free; 9.09 GiB reserved in total by PyTorch)

Reproduces the problem - issue on A100 I also launched an instance with a A100 to verify if it genuinely needed more memory. image image

Thoughts I am quite confused as to why my dataset is more memory hungry, given that it is only 733 points. Also it trained fine with the SUNRGB-D Dataset on a 2080ti.

filaPro commented 1 year ago

Hi, @Shromm-Gaind Looks like the problem is in the scale of your data. I briefly looked at your data and the sizes of your objects are ~100. We operate in meters, so are your objects actually of size 100 meters? FCAF3D operates with 4 levels of 0.08, 0.16, 0.32, and 0.64 meters, which are much smaller then your objects, if I understand correctly.