Closed mattangus closed 4 years ago
https://developer.nvidia.com/automatic-mixed-precision
Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate, and using loss scaling to preserve small gradient values.
https://developer.nvidia.com/blog/mixed-precision-training-deep-neural-networks/
This is exactly what I was looking for. Thanks!
I've been trying to speed up my training using tensor cores. I've been looking through the code and reading other issues. I came across this line:
The first parts make sense to me. The latter bits don't. Specifically:
loss_scale != 1
: Why is a loss scale needed?(l.c / l.groups) % 8 == 0
: I think I don't understand groups here to know what this part meansl.groups <= 1
: this seems to be redundant to the one above. ifgroups == 1
then the one above should bel.c % 8 == 0
.I saw this comment that says loss_scale is needed but no explaination.
Any comments on this would be very helpful!