AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.63k stars 7.95k forks source link

"out off memory" error with RTX 2080 ti #6872

Open blafasel42 opened 3 years ago

blafasel42 commented 3 years ago

I hope this is not a noob question, but i spent the day on trying different things and i just cannot find why darknet training cannot use GPU memory right:

If you want to report a bug - provide:

only subdivisions=64 works but then, only 4 GB of the 11 on the GPU are used

also: why does it say 896 x 896 when i set width and height to 608 each in cfg?

* what command do you use?

./darknet detector train data/obj.data cfg/yolov4-argos.cfg yolov4.conv.137 -dont_show -clean

* do you use Win/Linux/Mac?

Ubuntu 18.04 with Nvidia Driver 455 and CUDA 11.1 (before: Driver 450 and CUDA 10.2: same problem)

* attach screenshot of a bug with previous messages in terminal
[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 127.527 
avg_outputs = 1051548 
 Allocate additional workspace_size = 95.22 MB 
Loading weights from yolov4.conv.137...
 seen 64, trained: 0 K-images (0 Kilo-batches_64) 
Done! Loaded 137 layers from weights-file 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
 Detection layer: 139 - type = 28 
 Detection layer: 150 - type = 28 
 Detection layer: 161 - type = 28 
Resizing, random_coef = 1.40 

 896 x 896 
 Create 6 permanent cpu-threads 
 used slow CUDNN algo without Workspace! Need memory: 7962896, available: 2686976
 Try to set subdivisions=64 in your cfg-file. 
CUDA status Error: file: /home/gaylord/darknet/src/dark_cuda.c : () : line: 373 : build time: Oct 23 2020 - 14:52:12 

 CUDA Error: out of memory
CUDA Error: out of memory: File exists
* in what cases a bug occurs, and in which not?

happens with: batch=64 subdivisions=32

does not happen with: batch=64 subdivisions=64

stephanecharette commented 3 years ago

See: https://www.ccoderun.ca/programming/2020-09-25_Darknet_FAQ/#cuda_out_of_memory

Also would be nice to see the output of nvidia-smi to see exactly how much memory is being used and by which process.

blafasel42 commented 3 years ago

Hey, thanks for the quick anser.

during trainin, nvidia-smi says:

nvidia-smi
Fri Oct 23 17:23:23 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 455.32.00    Driver Version: 455.32.00    CUDA Version: 11.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  GeForce RTX 208...  On   | 00000000:01:00.0 Off |                  N/A |
| 38%   46C    P2   249W / 260W |   6670MiB / 11019MiB |     94%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A      2262      C   ./darknet                        6667MiB |
+-----------------------------------------------------------------------------+

(GPU memory reduces to 5491MIB later)

What i find curious: At the start of the training, is says;

[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05
nms_kind: greedynms (1), beta = 0.600000 
Total BFLOPS 127.527 
avg_outputs = 1051548 
 Allocate additional workspace_size = 81.03 MB 
Loading weights from yolov4.conv.137...
 seen 64, trained: 0 K-images (0 Kilo-batches_64) 
Done! Loaded 137 layers from weights-file 
Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005
 Detection layer: 139 - type = 28 
 Detection layer: 150 - type = 28 
 Detection layer: 161 - type = 28 
Resizing, random_coef = 1.40 

 896 x 896 
 Create 6 permanent cpu-threads 
 try to allocate additional workspace_size = 104.00 MB 
 CUDA allocate done! 
Loaded: 0.000024 seconds

Does that mean, the network trains with 896 x 896 pixels?

Configuration starts so:

[net]
# Testing
#batch=1
#subdivisions=1
# Training
batch=64
subdivisions=64
width=608
height=608
channels=3
momentum=0.949
decay=0.0005
angle=0
saturation = 1.5
exposure = 1.5
hue=.1

learning_rate=0.001
burn_in=1000
max_batches = 40000
policy=steps
steps=32000,36000
scales=.1,.1

#cutmix=1
mosaic=1

the rest of the configuration file is basically the standard yolov4-custom.cfg from the distribution with adjusted filters as stated in the readme

versavel commented 3 years ago

Yes, if you set random=1 in the config file.

That’s why the output shows: Resizing, random_coef = 1.40

On Oct 23, 2020, at 8:26 AM, Gaylord Aulke notifications@github.com wrote:

Hey, thanks for the quick anser.

during trainin, nvidia-smi says:

nvidia-smi Fri Oct 23 17:23:23 2020
+-----------------------------------------------------------------------------+ | NVIDIA-SMI 455.32.00 Driver Version: 455.32.00 CUDA Version: 11.1 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 GeForce RTX 208... On | 00000000:01:00.0 Off | N/A | | 38% 46C P2 249W / 260W | 6670MiB / 11019MiB | 94% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 2262 C ./darknet 6667MiB | +-----------------------------------------------------------------------------+ What i find curious: At the start of the training, is says;

[yolo] params: iou loss: ciou (4), iou_norm: 0.07, obj_norm: 1.00, cls_norm: 1.00, delta_norm: 1.00, scale_x_y: 1.05 nms_kind: greedynms (1), beta = 0.600000 Total BFLOPS 127.527 avg_outputs = 1051548 Allocate additional workspace_size = 81.03 MB Loading weights from yolov4.conv.137... seen 64, trained: 0 K-images (0 Kilo-batches_64) Done! Loaded 137 layers from weights-file Learning Rate: 0.001, Momentum: 0.949, Decay: 0.0005 Detection layer: 139 - type = 28 Detection layer: 150 - type = 28 Detection layer: 161 - type = 28 Resizing, random_coef = 1.40

896 x 896 Create 6 permanent cpu-threads try to allocate additional workspace_size = 104.00 MB CUDA allocate done! Loaded: 0.000024 seconds Does that mean, the network trains with 896 x 896 pixels?

Configuration starts so:

[net]

Testing

batch=1

subdivisions=1

Training

batch=64 subdivisions=64 width=608 height=608 channels=3 momentum=0.949 decay=0.0005 angle=0 saturation = 1.5 exposure = 1.5 hue=.1

learning_rate=0.001 burn_in=1000 max_batches = 40000 policy=steps steps=32000,36000 scales=.1,.1

cutmix=1

mosaic=1

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/AlexeyAB/darknet/issues/6872#issuecomment-715411518, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA6NHJQAKPQR22VRV5Y76RLSMGODLANCNFSM4S4UZ6UQ.

ywssng commented 3 years ago

@blafasel42 Hello, in my experience with 2080 ti, you have to change width and height to more smaller size (ex. 416). I think when (width, height) = (608, 608), and random=1 in the config file, it hardly fits into the memory.

blafasel42 commented 3 years ago

Ok thanks for the info, So my input images will be processed at 608x608 / 1.40 and *1.40 randomly. I had done augmentation before training start. including crop, rotate, scale, hue, color etc. Does that make even sense then? Should i remove augmentation steps before training? I also scaled all images to 608x608 but that does not seem that reasonable then, too, right?