AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.75k stars 7.96k forks source link

CUDNN_HALF required more GPU memory? #5059

Open keita-honsho opened 4 years ago

keita-honsho commented 4 years ago

Hi @AlexeyAB,

I have two PCs with the same specifications. Replace the GPU of one PC from GTX1080 to RTX2080.

Same data to train based yolov3-tiny.cfg parameters, change class to 4, filter, anchor.

GTX1080 PC can train batch = 64, subdivision = 2 But RTX2080'sPC cannot train batch = 64, subdivision = 2 And RTX2080'sPC can train batch = 64, subdivision = 4

GTX1080 and RTX2080 darknet.exe were the same source. only change CUDNN_HALF.

Does CUDNN_HALF require more GPU memory?

Try to set subdivisions=64 in your cfg-file. CUDA Error: out of memory: No error

seen 64, trained: 23526 K-images (367 Kilo-batches_64) Learning Rate: 0.001, Momentum: 0.9, Decay: 0.0005 If error occurs - run training with flag: -dont_show Resizing, random_coef = 1.40 608 x 608 used slow CUDNN algo without Workspace! Need memory: 1164509184, available: 587464704 CUDNN-slow used slow CUDNN algo without Workspace! Need memory: 4610719744, available: 474218496 CUDNN-slow used slow CUDNN algo without Workspace! Need memory: 1164509184, available: 417595392 CUDNN-slow used slow CUDNN algo without Workspace! Need memory: 606126080, available: 213123072 CUDNN-slow try to allocate additional workspace_size = 759.30 MB CUDA status Error: file: C:\local\darknet\src\dark_cuda.c : cuda_make_array() : line: 361 : build time: Jan 29 2020 - 14:52:21 CUDA Error: out of memory

AlexeyAB commented 4 years ago

GTX1080 PC can train batch = 64, subdivision = 2 But RTX2080'sPC cannot train batch = 64, subdivision = 2 And RTX2080'sPC can train batch = 64, subdivision = 4

GTX1080 and RTX2080 darknet.exe were the same source. only change CUDNN_HALF.

What is the difference between 1st and 2nd RTX ?

Does CUDNN_HALF require more GPU memory?

Yes. Because currently only convolutional layer is implemented for Tensor Cores, and only for iterations higher tha 3x burn_in, so we should have 2 implementations of conv-layers FP32 and FP16. Later it will be solved.

keita-honsho commented 4 years ago

thanks

What is the difference between 1st and 2nd RTX ?

Basically the environment is exactly the same. error occurred "Try to set subdivisions=64 in your cfg-file." Change subdivisions=2 to 4, to retry training.

Later it will be solved.

I highly expect it.

Increasing subdivisions slows down calculations. Is currently better to use CUDNN_HALF = 0 for training?

AlexeyAB commented 4 years ago

@keita-honsho Currently CUDNN_HALF = 0 is disabled for training and doesn't affect on training.

keita-honsho commented 4 years ago

thanks.

I have two darknet.exe's.

(a) One is for GTX1080 with ENABLE_CUDA, ENABALE_CUDNN and ENABLE_OPENCV enabled and "CUDA_ARCHITECTURES" rewritten to "6.1 7.5" (ENABLE_CUDNN_HALF is disabled).

(b) The other one is for RTX2080 with ENABLE_CUDA, ENABALE_CUDNN, ENABLE_CUDNN_HALF and ENABLE_OPENCV enabled and "CUDA_ARCHITECTURES" rewritten to "6.1 7.5".

I'm assuming that if I use darknet.exe in (a) on RTX2080, CUDNN_HALF will be disabled and it will work as well as GTX1080, but is it there?

AlexeyAB commented 4 years ago

I'm assuming that if I use darknet.exe in (a) on RTX2080, CUDNN_HALF will be disabled and it will work as well as GTX1080, but is it there?

Yes. If darknet.exe is compiled for CC 3.0 or 7.5 then it can be run on RTX 2070