CUDA out of memory error when training with 2 GPUs (Tesla V100 16GB RAM)

XuyangBai / TransFusion

[PyTorch] Official implementation of CVPR2022 paper "TransFusion: Robust LiDAR-Camera Fusion for 3D Object Detection with Transformers". https://arxiv.org/abs/2203.11496

Apache License 2.0

642 stars 77 forks source link

CUDA out of memory error when training with 2 GPUs (Tesla V100 16GB RAM) #49

Closed ytan101 closed 2 years ago

ytan101 commented 2 years ago

Hi, I tried to train the model on nuscenes-mini with 2 Tesla V100s, but still get out of memory error (referencing issue 34 where 16GB should be enough). Is there any specific configuration I can tweak to help with this issue?

Thank you very much!

carry-all-coder commented 1 year ago

Hi! I met same problem of cuda out of memory with 4 Tesla V100s(16GB): RuntimeError: CUDA out of memory. Tried to allocate 32.00 MiB (GPU 1; 15.75 GiB total capacity; 4.74 GiB already allocated; 26.6 2 MiB free; 4.82 GiB reserved in total by PyTorch)

Could you please provide some suggestions on how to solve it? I have tried to degrade samples per gpu to 1.

ghost commented 1 year ago

@carry-all-coder Hi,I met same problem of cuda out of memory,have you solved it?Could you provide some suggestions,thanks!