RuntimeError: CUDA out of memory.

sm3304love commented 3 years ago

python rotate_train.py --weights rotate-yolov5s-ucas.pt --cfg rotate_yolov5s_ucas.yaml \
>      --data rotate_ucas.yaml --hyp hyp.ucas.yaml --img-size 1024 \
>      --epochs 3 --batch-size 12 --noautoanchor --rotate --cache
YOLOv5 🚀 v1.0-0-g298a36e torch 1.9.1+cu102 CUDA:0 (NVIDIA GeForce RTX 2060, 5934.5625MB)

Namespace(adam=False, artifact_alias='latest', batch_size=12, bbox_interval=-1, bucket='', cache_images=True, cfg='./models/rotate_yolov5s_ucas.yaml', data='./data/rotate_ucas.yaml', device='', entity=None, epochs=3, evolve=False, exist_ok=False, global_rank=-1, hyp='./data/hyp.ucas.yaml', image_weights=False, img_size=[1024, 1024], label_smoothing=0.0, linear_lr=False, local_rank=-1, multi_scale=False, name='exp', noautoanchor=True, nosave=False, notest=False, project='runs/train', quad=False, rect=False, resume=False, rotate=True, save_dir='runs/train/exp2', save_period=-1, single_cls=False, sync_bn=False, total_batch_size=12, upload_dataset=False, weights='rotate-yolov5s-ucas.pt', workers=8, world_size=1)
tensorboard: Start with 'tensorboard --logdir runs/train', view at http://localhost:6006/
hyperparameters: lr0=0.01, lrf=0.2, momentum=0.937, weight_decay=0.0005, warmup_epochs=3.0, warmup_momentum=0.8, warmup_bias_lr=0.1, box=0.05, cls=0.5, cls_pw=1.0, obj=1.0, obj_pw=1.0, iou_t=0.1, anchor_t=4.0, fl_gamma=0.0, hsv_h=0.015, hsv_s=0.7, hsv_v=0.4, degrees=30.0, translate=0.1, scale=0.0, shear=5.0, perspective=0.0005, flipud=0.5, fliplr=0.5, mosaic=0.0, mixup=0.0
wandb: Install Weights & Biases for YOLOv5 logging with 'pip install wandb' (recommended)

                 from  n    params  module                                  arguments                     
  0                -1  1      3520  models.common.Focus                     [3, 32, 3]                    
  1                -1  1     18560  models.common.Conv                      [32, 64, 3, 2]                
  2                -1  1     18816  models.common.C3                        [64, 64, 1]                   
  3                -1  1     73984  models.common.Conv                      [64, 128, 3, 2]               
  4                -1  1    156928  models.common.C3                        [128, 128, 3]                 
  5                -1  1    295424  models.common.Conv                      [128, 256, 3, 2]              
  6                -1  1    625152  models.common.C3                        [256, 256, 3]                 
  7                -1  1   1180672  models.common.Conv                      [256, 512, 3, 2]              
  8                -1  1    656896  models.common.SPP                       [512, 512, [5, 9, 13]]        
  9                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 10                -1  1    131584  models.common.Conv                      [512, 256, 1, 1]              
 11                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 12           [-1, 6]  1         0  models.common.Concat                    [1]                           
 13                -1  1    361984  models.common.C3                        [512, 256, 1, False]          
 14                -1  1     33024  models.common.Conv                      [256, 128, 1, 1]              
 15                -1  1         0  torch.nn.modules.upsampling.Upsample    [None, 2, 'nearest']          
 16           [-1, 4]  1         0  models.common.Concat                    [1]                           
 17                -1  1     90880  models.common.C3                        [256, 128, 1, False]          
 18                -1  1    147712  models.common.Conv                      [128, 128, 3, 2]              
 19          [-1, 14]  1         0  models.common.Concat                    [1]                           
 20                -1  1    296448  models.common.C3                        [256, 256, 1, False]          
 21                -1  1    590336  models.common.Conv                      [256, 256, 3, 2]              
 22          [-1, 10]  1         0  models.common.Concat                    [1]                           
 23                -1  1   1182720  models.common.C3                        [512, 512, 1, False]          
 24      [17, 20, 23]  1     40455  models.yolo.Rotate_Detect               [2, [[27, 26, 20, 40, 44, 19, 34, 34, 25, 47], [55, 24, 44, 38, 31, 61, 50, 50, 63, 45], [65, 62, 88, 60, 84, 79, 113, 85, 148, 122]], [128, 256, 512]]
Model Summary: 283 layers, 7087815 parameters, 7087815 gradients, 16.5 GFLOPs

Transferred 360/362 items from rotate-yolov5s-ucas.pt
Scaled weight_decay = 0.00046875
Optimizer groups: 62 .bias, 62 conv.weight, 59 other
train: Scanning '../UCAS50/train.cache' images and labels... 38 found, 0 missing
train: Caching images (0.1GB): 100%|████████████| 38/38 [00:00<00:00, 81.82it/s]
val: Scanning '../UCAS50/val.cache' images and labels... 10 found, 0 missing, 0 
val: Caching images (0.0GB): 100%|██████████████| 10/10 [00:00<00:00, 19.17it/s]
Plotting labels... 
Image sizes 1024 train, 1024 test
Using 8 dataloader workers
Logging results to runs/train/exp2
Starting training for 3 epochs...

     Epoch   gpu_mem       box       obj       cls     total    labels  img_size
  0%|                                                     | 0/4 [00:02<?, ?it/s]
Traceback (most recent call last):
  File "rotate_train.py", line 553, in <module>
    train(hyp, opt, device, tb_writer, rotate=opt.rotate)
  File "rotate_train.py", line 313, in train
    pred = model(imgs)  # forward
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/yolo.py", line 122, in forward
    return self.forward_once(x, profile)  # single-scale inference, train
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/yolo.py", line 153, in forward_once
    x = m(x)  # run
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/common.py", line 139, in forward
    return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/container.py", line 139, in forward
    input = module(input)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/common.py", line 105, in forward
    return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/sm3304love/Desktop/RotateObjectDetection/rotate-yolov5/models/common.py", line 43, in forward
    return self.act(self.bn(self.conv(x)))
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/module.py", line 1051, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/modules/activation.py", line 395, in forward
    return F.silu(input, inplace=self.inplace)
  File "/usr/local/lib/python3.6/dist-packages/torch-1.9.1-py3.6-linux-x86_64.egg/torch/nn/functional.py", line 1898, in silu
    return torch._C._nn.silu(input)
RuntimeError: CUDA out of memory. Tried to allocate 24.00 MiB (GPU 0; 5.80 GiB total capacity; 2.56 GiB already allocated; 13.69 MiB free; 2.61 GiB reserved in total by PyTorch)

I followed the example while looking at the contents written on README.md, and there was an error in the last train part. How can I solve this problem?

XinzeLee commented 3 years ago

This is because your GPU runs out of memory. To overcome this, you have three options:

Do not specify "--cache".
Buy new RAM (more than 16 GB)
Use/increase swap file size (below gives you 32GB of swap file)


sudo chmod 600 /swapfile
sudo mkswap /swapfile
sudo swapon /swapfile
free -h  # check memory

sm3304love commented 3 years ago

Screenshot from 2021-10-07 20-34-10

As a result of checking, my computer's RAM is 16GB and the swapfile is 32GB. But still the same problem arises.

XinzeLee commented 3 years ago

Yes, but the program depends on GPU memory usage. And I think your machine has only 6GB memory for GPU. Maybe you can try: 1. reduce batch-size to 3; 2. reduce the image size.

Zivid99 commented 2 years ago

hello, I get a problem when I install cuda extension in 'python setup.py install'. It seems that u run in cuda, and I wonder did you compile successfully? here is my error : `C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include\cub\iterator../util_device.cuh(330): here instantiation of "cub::PerDeviceAttributeCache::DevicePayload cub::PerDeviceAttributeCache::operator()(Invocable &&, int) [with Invocable=lambda [](int &)->cudaError_t]" C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\include\cub\iterator../util_device.cuh(431): here

56 errors detected in the compilation of "inter_union_cuda.cu". error: command 'C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v11.1\bin\nvcc.exe' failed with exit code 1 `

XinzeLee / RotateObjectDetection

RuntimeError: CUDA out of memory. #8