Open DocF opened 4 years ago
[net]
loss_scale=128
[net] loss_scale=128
Hi, thx for replying this issue. I already set
loss_scale=128
, but still can't save GPU memory. It can trainbatch=64 subdivisions=16
on GTX1080ti, but out of memory on RTX2080ti.
I suspect that the problem is with thecudnn_half=1
flag according to #5059
It doesn't reduce GPU memory usage in the current implementation. It only speeds up training (after 3000 iterations) and inference.
Ok, get it. Thanks again for this great repo!
@AlexeyAB , @WongKinYiu
What about Mixed Precision (TF32) on Ampere GPU ?
https://developer.nvidia.com/blog/accelerating-tensorflow-on-a100-gpus/
Hi @AlexeyAB Thanks for this great repo! I tested cudnn_half = 1 and 0 on a RTX2080ti, but I found that both have the same training time and the same GPU memory. In my opinion, if I use fp16, it should save half of the GPU memory. Can you tell me how to use half precision or mixed precision training? Thx!