Open ArshadIram opened 2 years ago
I had a similar problem. Try reducing the batch size.
I tried but all in vain. By default 16. How much you reduced?
You can try to reduce num worker, dim of image , batch size to fix this .
It seems that sometimes the use of memory increases a lot and than decreased during the training process, which means you should make sure that there's always some free memory to avoid CUDA: our of memory
, you can do this by reducing the batch size.
Facing a similar issue. The memory issue I was able to solve by setting pin_memory=False
in /utils/datasets.py Line 89. After facing that issue, I'm now left with the following error when I try and set my batch size to larger than 1. I would expect that 16gb of memory and an RTX 3070 would be adequate for training with >1 batch size but someone please correct me if I am mistaken. Also worth noting that I am working within WSL2.
RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Have you solved this problem? I had the same problem, I set batch size to 1 and works to 0. Training tiny model on 3090, In epoch1, get a "out of memory" error
Same issue, trying to train in AWS on the g4dn instance and getting torch.cuda.OutOfMemoryError: CUDA out of memory.
even after I set pin_memory
to False
Same issue on colab GPU: torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.75MB) Error:
Epoch gpu_mem box obj cls total labels img_size
0/54 11.6G 0.08637 0.07074 0.07379 0.2309 655 640: 3% 5/157 [00:17<08:40, 3.43s/it]
Traceback (most recent call last):
File "train.py", line 616, in
Can somebody help with this problem ?
Same issue here, reduced the batch size down to 4, getting ~19/24 GB used in my Titan RTX during the first epoch. Now I'm at the fourth epoch with 21.3 GB of used memory, and something tells me it's not going down
I am trying to train Yolov7 on custom dataset with less then 500 images. I am following the official tutorial of Yolov7 with roboflow (https://colab.research.google.com/drive/1X9A8odmK4k6l26NDviiT6dd6TgR-piOa#scrollTo=nD-uPyQ_2jiN). However, when I am training the model on custom dataset I am getting cuda out of memory error.
Device Information: torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB) Notebook: Colabs
Starting training for 55 epochs...
Traceback (most recent call last): File "train.py", line 616, in
train(hyp, opt, device, tb_writer)
File "train.py", line 361, in train
pred = model(imgs) # forward
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, kwargs)
File "/content/yolov7/models/yolo.py", line 599, in forward
return self.forward_once(x, profile) # single-scale inference, train
File "/content/yolov7/models/yolo.py", line 625, in forward_once
x = m(x) # run
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, *kwargs)
File "/content/yolov7/models/common.py", line 507, in forward
return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(input, kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward
input = module(input)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl
return forward_call(*input, **kwargs)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 457, in forward
return self._conv_forward(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 454, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: CUDA out of memory. Tried to allocate 4.22 GiB (GPU 0; 14.76 GiB total capacity; 6.04 GiB already allocated; 4.22 GiB free; 9.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF