Network datatype - Githubissues

stephengreen commented 1 year ago

With PyTorch 1.12, the default datatype for neural networks on Ampere devices changes from TF32 to FP32: https://pytorch.org/docs/stable/notes/cuda.html

Most of our experiments would have been with an earlier version of PyTorch, and hence using TF32. Since this type is more efficient and does not apparently have detrimental effects, we should probably continue to use this. According to the link, we should set

torch.backends.cuda.matmul.allow_tf32 = True
torch.backends.cudnn.allow_tf32 = True

I'm not sure which scripts we should set this in (whether just training or training and inference). Maybe we should set it globally in one module, which we then import into all our scripts. Thoughts?

Also, has anybody experimented with FP32? If you are training using PyTorch 1.12 or non-Ampere, you are probably using this inadvertently. Did it give any improvement in performance?

nihargupte-ph commented 1 year ago

I am using PyTorch 1.12.1+cu102. Not sure if it gives improvement in performance since I've always been using the same version (I think) running

   cuda = torch.device("cuda")
   a = torch.randn(10000, 10000).to(cuda)
   b = torch.randn(10000, 10000).to(cuda)
   torch.cuda.synchronize()
   %time c = torch.matmul(a, b)

Gives

CPU times: user 64 µs, sys: 73 µs, total: 137 µs
Wall time: 143 µs

Not sure what this would be for non FP32 but maybe we can compare.

stephengreen commented 1 year ago

FP32 has higher precision than TF32, so by "performance improvement" I was referring to the accuracy of the final result, if the network can make use of this higher precision. TF32 calculations on Ampere are extremely optimized so training would be faster (at the expense of reduced precision).

stephengreen commented 2 months ago

This should probably be set as an option.

dingo-gw / dingo

Network datatype #119