WongKinYiu / yolov7

Implementation of paper - YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors
GNU General Public License v3.0
13.39k stars 4.23k forks source link

RuntimeError: CUDA out of memory #865

Open ArshadIram opened 2 years ago

ArshadIram commented 2 years ago

I am trying to train Yolov7 on custom dataset with less then 500 images. I am following the official tutorial of Yolov7 with roboflow (https://colab.research.google.com/drive/1X9A8odmK4k6l26NDviiT6dd6TgR-piOa#scrollTo=nD-uPyQ_2jiN). However, when I am training the model on custom dataset I am getting cuda out of memory error.

Device Information: torch 1.12.1+cu113 CUDA:0 (Tesla T4, 15109.75MB) Notebook: Colabs

Starting training for 55 epochs...

 Epoch   gpu_mem       box       obj       cls     total    labels  img_size
  0/54     10.6G    0.0688   0.02088         0   0.08968        74       640:  96% 23/24 [00:48<00:02,  2.13s/it]

Traceback (most recent call last): File "train.py", line 616, in train(hyp, opt, device, tb_writer) File "train.py", line 361, in train pred = model(imgs) # forward File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, kwargs) File "/content/yolov7/models/yolo.py", line 599, in forward return self.forward_once(x, profile) # single-scale inference, train File "/content/yolov7/models/yolo.py", line 625, in forward_once x = m(x) # run File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, *kwargs) File "/content/yolov7/models/common.py", line 507, in forward return self.act(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(input, kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/container.py", line 139, in forward input = module(input) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/module.py", line 1130, in _call_impl return forward_call(*input, **kwargs) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 457, in forward return self._conv_forward(input, self.weight, self.bias) File "/usr/local/lib/python3.7/dist-packages/torch/nn/modules/conv.py", line 454, in _conv_forward self.padding, self.dilation, self.groups) RuntimeError: CUDA out of memory. Tried to allocate 4.22 GiB (GPU 0; 14.76 GiB total capacity; 6.04 GiB already allocated; 4.22 GiB free; 9.24 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

SancakOzdemir commented 2 years ago

I had a similar problem. Try reducing the batch size.

ArshadIram commented 2 years ago

I tried but all in vain. By default 16. How much you reduced?

284nnuS commented 2 years ago

You can try to reduce num worker, dim of image , batch size to fix this .

microboym commented 2 years ago

It seems that sometimes the use of memory increases a lot and than decreased during the training process, which means you should make sure that there's always some free memory to avoid CUDA: our of memory, you can do this by reducing the batch size.

austinulfers commented 2 years ago

Facing a similar issue. The memory issue I was able to solve by setting pin_memory=False in /utils/datasets.py Line 89. After facing that issue, I'm now left with the following error when I try and set my batch size to larger than 1. I would expect that 16gb of memory and an RTX 3070 would be adequate for training with >1 batch size but someone please correct me if I am mistaken. Also worth noting that I am working within WSL2.

RuntimeError: CUDA error: unknown error
CUDA kernel errors might be asynchronously reported at some other API call,so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
MrAccelerator commented 2 years ago

Have you solved this problem? I had the same problem, I set batch size to 1 and works to 0. Training tiny model on 3090, In epoch1, get a "out of memory" error

az-oolloow commented 1 year ago

Same issue, trying to train in AWS on the g4dn instance and getting torch.cuda.OutOfMemoryError: CUDA out of memory. even after I set pin_memory to False

MaxxTr commented 1 year ago

Same issue on colab GPU: torch 1.13.1+cu116 CUDA:0 (Tesla T4, 15109.75MB) Error:

 Epoch   gpu_mem       box       obj       cls     total    labels  img_size
  0/54     11.6G   0.08637   0.07074   0.07379    0.2309       655       640:   3% 5/157 [00:17<08:40,  3.43s/it]

Traceback (most recent call last): File "train.py", line 616, in train(hyp, opt, device, tb_writer) File "train.py", line 363, in train loss, loss_items = compute_loss_ota(pred, targets.to(device), imgs) # loss scaled by batchsize File "/content/yolov7/utils/loss.py", line 585, in call bs, as, gjs, gis, targets, anchors = self.build_targets(p, targets, imgs) File "/content/yolov7/utils/loss.py", line 732, in build_targets pair_wise_cls_loss = F.binary_cross_entropy_with_logits( File "/usr/local/lib/python3.8/dist-packages/torch/nn/functional.py", line 3162, in binary_cross_entropy_with_logits return torch.binary_cross_entropy_with_logits(input, target, weight, pos_weight, reduction_enum) torch.cuda.OutOfMemoryError: CUDA out of memory

Can somebody help with this problem ?

mtanf commented 1 year ago

Same issue here, reduced the batch size down to 4, getting ~19/24 GB used in my Titan RTX during the first epoch. Now I'm at the fourth epoch with 21.3 GB of used memory, and something tells me it's not going down