AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.68k stars 7.96k forks source link

How to train YOLOv4 in FP16 or Mixed Precision #5474

Open DocF opened 4 years ago

DocF commented 4 years ago

Hi @AlexeyAB Thanks for this great repo! I tested cudnn_half = 1 and 0 on a RTX2080ti, but I found that both have the same training time and the same GPU memory. In my opinion, if I use fp16, it should save half of the GPU memory. Can you tell me how to use half precision or mixed precision training? Thx!

WongKinYiu commented 4 years ago
[net]
loss_scale=128
DocF commented 4 years ago
[net]
loss_scale=128

Hi, thx for replying this issue. I already set loss_scale=128, but still can't save GPU memory. It can train batch=64 subdivisions=16 on GTX1080ti, but out of memory on RTX2080ti.
I suspect that the problem is with the cudnn_half=1 flag according to #5059

AlexeyAB commented 4 years ago

It doesn't reduce GPU memory usage in the current implementation. It only speeds up training (after 3000 iterations) and inference.

DocF commented 4 years ago

Ok, get it. Thanks again for this great repo!

arnaud-nt2i commented 3 years ago

@AlexeyAB , @WongKinYiu

What about Mixed Precision (TF32) on Ampere GPU ?

https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/