Alexey-Kamenev commented 3 days ago

Modulus Pull Request

Description

Training Physics Informed Neural Networks with Automatic Mixed Precision (AMP) currently leads to infinity loss for several models. This comes from the higher-order derivatives, which could go beyond the FP16 dynamic range (5.96e-8 to 65504). For example, the training of lid driven cavity problem got terminated around 3000 steps because u__y__y at the corner passes the FP16 maximum value of 65504.

The default AMP GradScaler tracks the model parameter gradients but not the derivatives calculated from torch.autograd.grad.

DerivScaler

The derivatives need to be tracked by another scaler because the derivatives and NN parameter gradients have different dynamic ranges. The dynamic range of FP16 is from 2^-24 to 2^15 (40 powers of 2).
The following range is different from problem to problem, just for reference

Typical weight gradient range: 2^-40 to 2^-10
Typical first order derivative range: 2^-10 to 2^5
Typical second order derivative range: 2^0 to 2^20

This PR adds a DerivScaler which

scales and unscales the derivatives during forward in the derivative node, so the operations used in FP16 are in the good range
check INFs/NaNs when unscaling the derivatives. When INFs/NaNs are detected, this iteration will be skipped, and the scale value will be adjusted.

It supports the following features

Per derivative order scalers (default)
Per derivative term scalers
Control the scaling factors
- Avoid a very low derivative scaling factor
- use fused activation or disable autocast for activation to avoid intermediate result overflow.
- Use FP32 for the first layer that produces the high-order derivatives, this layer always overflows and this does not introduce too much perf drop.
- Avoid scaling factor decreasing too fast, useful for Fourier Neural Network
- Entering “recover mode” if the scaling factor is less than a predefined threshold. In this mode, the scaling factor grows more frequently.

For more details, please refer to this publication.

Checklist

[x] I am familiar with the Contributing Guidelines.
[x] New or existing tests cover these changes.
[x] The documentation is up to date with these changes.
[x] The CHANGELOG.md is up to date with these changes.
[ ] An issue is linked to this pull request.

Dependencies

Alexey-Kamenev commented 2 days ago

/blossom-ci

ktangsali commented 2 days ago

/blossom-ci

Alexey-Kamenev commented 2 days ago

/blossom-ci

ktangsali commented 2 days ago

/blossom-ci

NVIDIA / modulus-sym

Add Automatic Mixed Precision (AMP) support for derivatives. #160

Modulus Pull Request

Description

DerivScaler

Checklist

Dependencies