AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.8k stars 7.97k forks source link

Why is loss_scale required for tensor cores. #6866

Closed mattangus closed 4 years ago

mattangus commented 4 years ago

I've been trying to speed up my training using tensor cores. I've been looking through the code and reading other issues. I came across this line:

if (state.index != 0 && state.net.cudnn_half && !l.xnor && (!state.train || (iteration_num > 3 * state.net.burn_in) && state.net.loss_scale != 1) &&
        (l.c / l.groups) % 8 == 0 && l.n % 8 == 0 && l.groups <= 1 && l.size > 1)

The first parts make sense to me. The latter bits don't. Specifically:

I saw this comment that says loss_scale is needed but no explaination.

Any comments on this would be very helpful!

AlexeyAB commented 4 years ago

https://developer.nvidia.com/automatic-mixed-precision

Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate, and using loss scaling to preserve small gradient values.

https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/

image

https://nvlabs.github.io/iccv2019-mixed-precision-tutorial/files/dusan_stosic_intro_to_mixed_precision_training.pdf

image

mattangus commented 4 years ago

This is exactly what I was looking for. Thanks!