Closed FlorianDeconinck closed 1 day ago
Relative error at f32 should never be more than e-8/e-9 so we could already swap our threshold depending on the precision.
A better scientific noise/signal check would be to perturbate the inputs on a validating CPU then we would get a real distribution of the potential errors and we could check GPU versus that error.
PR bringing a multi-modal metric: https://github.com/NOAA-GFDL/NDSL/pull/67
Merged and available via --multimodal_metric
when running pytest
based translate tests
Physics parametrization in GEOS are all run at 32-bit precision. The original Translate structure, and it's metric calculation to judge if an error is small enough for the test to pass, were designed with a 64-bit float code.
The metric looks at of order of magnitude normalized to the value. E.g.:
>1e-10
will be a failure.>~ 1e-15
The actual calculation is in
ndsl/testing/comparison.py
This methods allow a semblance of normalization across different amplitudes but suffer for very small values. If this issue was less pro-eminent at 64-bit it becomes an issue at 32-bit. Even more, the physics deal with very small variation in amplitude, where a single fused-multiply-add could swap results from one architecture to the next.
We need to come up with a series of checks, to distinguish true error from f32 noise.
Parent: https://github.com/GEOS-ESM/SMT-Nebulae/issues/41