RuntimeError: CUDA out of memory. Tried to allocate 84.00 MiB (GPU 0; 3.82 GiB total capacity; 2.37 GiB already allocated; 76.44 MiB free; 2.52 GiB reserved in total by PyTorch)

husamhamu commented 2 years ago

Hi, can you please share with us a way to solve this error:

RuntimeError: CUDA out of memory. Tried to allocate 84.00 MiB (GPU 0; 3.82 GiB total capacity; 2.37 GiB already allocated; 76.44 MiB free; 2.52 GiB reserved in total by PyTorch)

First, I though it might be a compatibility issue, even though the message is quite clear that this is not the case so nothing really worked for me. Now I am having a hard time figuring out how to solve it, I would appreciate some help. Thanks

husamhamu commented 2 years ago

Here is the full traceback:

Traceback (most recent call last): File "./tools/train.py", line 109, in main() File "./tools/train.py", line 105, in main logger=logger) File "/home/husam/qd-3dt/tools/../qd3dt/apis/train.py", line 66, in train_detector _non_dist_train(model, dataset, cfg, validate=validate) File "/home/husam/qd-3dt/tools/../qd3dt/apis/train.py", line 270, in _non_dist_train runner.run(data_loaders, cfg.workflow, cfg.total_epochs) File "/home/husam/qd-3dt/mmcv/mmcv/runner/runner.py", line 361, in run epoch_runner(data_loaders[i], kwargs) File "/home/husam/qd-3dt/mmcv/mmcv/runner/runner.py", line 264, in train self.model, data_batch, train_mode=True, kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/apis/train.py", line 44, in batch_processor losses = model(data) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/parallel/data_parallel.py", line 150, in forward return self.module(inputs[0], kwargs[0]) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/core/fp16/decorators.py", line 49, in new_func return old_func(args, kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/models/detectrackers/base.py", line 86, in forward return self.forward_train(img, img_meta, kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/models/detectrackers/quasi_dense_3d_sep_uncertainty.py", line 147, in forward_train x = self.extract_feat(img) File "/home/husam/qd-3dt/tools/../qd3dt/models/detectrackers/quasi_dense_3d_sep_uncertainty.py", line 118, in extract_feat x = self.neck(x) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/models/necks/dlaup.py", line 158, in forward x, y = ida(layers[-i - 2:]) # y : aggregation nodes File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/models/necks/dlaup.py", line 107, in forward x = node(torch.cat([x, layers[i]], 1)) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(*input, *kwargs) File "/home/husam/qd-3dt/tools/../qd3dt/models/utils/conv_module.py", line 151, in forward x = self.conv(x) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in call result = self.forward(input, kwargs) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 345, in forward return self.conv2d_forward(input, self.weight) File "/home/husam/.pyenv/versions/qd_3dt/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 342, in conv2d_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 84.00 MiB (GPU 0; 3.82 GiB total capacity; 2.37 GiB already allocated; 76.44 MiB free; 2.52 GiB reserved in total by PyTorch)

ChiangJYY commented 3 days ago

I had a similar problem, have you solved it yet?

husamhamu commented 3 days ago

I think that meant the memory is not enough. I ended up using Colab Pro.

ChiangJYY commented 3 days ago

I think that meant the memory is not enough. I ended up using Colab Pro.

Feel free to reach out to discuss this, nvidiarohs@gmail.com

ChiangJYY commented 3 days ago

I have a server with an A6000 48G video card right now, but I've read that the author used an 8x 32G V100 video card that takes 144 hours to train once. So I'm wondering if I only have an A6000 then would it take more than half a month to train once. I would like to ask you about your configurations and how long it takes to train once, are you from China and if so can you add a wechat?

Husam Hamu @.***> 于2024年11月3日周日 18:36写道：

I think that meant the memory is not enough. I ended up using Colab Pro.

— Reply to this email directly, view it on GitHub https://github.com/SysCV/qd-3dt/issues/32#issuecomment-2453377579, or unsubscribe https://github.com/notifications/unsubscribe-auth/A73DG62JD3KZNJ4TAAZOP5LZ6X4CFAVCNFSM6AAAAABRCTIA4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDINJTGM3TONJXHE . You are receiving this because you commented.Message ID: @.***>

husamhamu commented 3 days ago

I only fine-tuned the model. And no I am not from China.

ChiangJYY commented 3 days ago

Ok, have you trained the model please, if it takes too long I might give up on the project. Can you recommend me some relevant monocular 3D tracking projects please?

Message ID: @.***>

husamhamu commented 3 days ago

I think I fine-tuned the model on a dataset that was maybe around 4000 images. I was done in hours with the setup of Colab Pro. And I have not been active in the field, so I can't recommend anything really. Check out my video for fun: https://www.linkedin.com/posts/husam-hamu-4a095b1a9_tdu-deeplearning-carla-activity-6980460901871050753-UnvV?utm_source=share&utm_medium=member_desktop

SysCV / qd-3dt

RuntimeError: CUDA out of memory. Tried to allocate 84.00 MiB (GPU 0; 3.82 GiB total capacity; 2.37 GiB already allocated; 76.44 MiB free; 2.52 GiB reserved in total by PyTorch) #32