Framework providing pythonic APIs, algorithms and utilities to be used with Modulus core to physics inform model training as well as higher level abstraction for domain experts
Training Physics Informed Neural Networks with Automatic Mixed Precision (AMP) currently leads to infinity loss for several models. This comes from the higher-order derivatives, which could go beyond the FP16 dynamic range (5.96e-8 to 65504). For example, the training of lid driven cavity problem got terminated around 3000 steps because u__y__y at the corner passes the FP16 maximum value of 65504.
The default AMP GradScaler tracks the model parameter gradients but not the derivatives calculated from torch.autograd.grad.
DerivScaler
The derivatives need to be tracked by another scaler because the derivatives and NN parameter gradients have different dynamic ranges.
The dynamic range of FP16 is from 2^-24 to 2^15 (40 powers of 2).
The following range is different from problem to problem, just for reference
Typical weight gradient range: 2^-40 to 2^-10
Typical first order derivative range: 2^-10 to 2^5
Typical second order derivative range: 2^0 to 2^20
This PR adds a DerivScaler which
scales and unscales the derivatives during forward in the derivative node, so the operations used in FP16 are in the good range
check INFs/NaNs when unscaling the derivatives. When INFs/NaNs are detected, this iteration will be skipped, and the scale value will be adjusted.
It supports the following features
Per derivative order scalers (default)
Per derivative term scalers
Control the scaling factors
Avoid a very low derivative scaling factor
use fused activation or disable autocast for activation to avoid intermediate result overflow.
Use FP32 for the first layer that produces the high-order derivatives, this layer always overflows and this does not introduce too much perf drop.
Avoid scaling factor decreasing too fast, useful for Fourier Neural Network
Entering “recover mode” if the scaling factor is less than a predefined threshold. In this mode, the scaling factor grows more frequently.
Modulus Pull Request
Description
Training Physics Informed Neural Networks with Automatic Mixed Precision (AMP) currently leads to infinity loss for several models. This comes from the higher-order derivatives, which could go beyond the FP16 dynamic range (
5.96e-8
to65504
). For example, the training of lid driven cavity problem got terminated around 3000 steps becauseu__y__y
at the corner passes the FP16 maximum value of65504
.The default AMP GradScaler tracks the model parameter gradients but not the derivatives calculated from
torch.autograd.grad
.DerivScaler
The derivatives need to be tracked by another scaler because the derivatives and NN parameter gradients have different dynamic ranges. The dynamic range of FP16 is from 2^-24 to 2^15 (40 powers of 2).
The following range is different from problem to problem, just for reference
This PR adds a DerivScaler which
It supports the following features
For more details, please refer to this publication.
Checklist
Dependencies