NVIDIA / modulus-sym

Framework providing pythonic APIs, algorithms and utilities to be used with Modulus core to physics inform model training as well as higher level abstraction for domain experts
https://developer.nvidia.com/modulus
Apache License 2.0
137 stars 56 forks source link

Add Automatic Mixed Precision (AMP) support for derivatives. #160

Open Alexey-Kamenev opened 3 days ago

Alexey-Kamenev commented 3 days ago

Modulus Pull Request

Description

Training Physics Informed Neural Networks with Automatic Mixed Precision (AMP) currently leads to infinity loss for several models. This comes from the higher-order derivatives, which could go beyond the FP16 dynamic range (5.96e-8 to 65504). For example, the training of lid driven cavity problem got terminated around 3000 steps because u__y__y at the corner passes the FP16 maximum value of 65504.

The default AMP GradScaler tracks the model parameter gradients but not the derivatives calculated from torch.autograd.grad.

DerivScaler

The derivatives need to be tracked by another scaler because the derivatives and NN parameter gradients have different dynamic ranges. The dynamic range of FP16 is from 2^-24 to 2^15 (40 powers of 2).
The following range is different from problem to problem, just for reference

This PR adds a DerivScaler which

  1. scales and unscales the derivatives during forward in the derivative node, so the operations used in FP16 are in the good range
  2. check INFs/NaNs when unscaling the derivatives. When INFs/NaNs are detected, this iteration will be skipped, and the scale value will be adjusted.

It supports the following features

For more details, please refer to this publication.

Checklist

Dependencies

Alexey-Kamenev commented 2 days ago

/blossom-ci

ktangsali commented 2 days ago

/blossom-ci

Alexey-Kamenev commented 2 days ago

/blossom-ci

ktangsali commented 2 days ago

/blossom-ci