Enhance the "metric" calculation for Translate test at f32

FlorianDeconinck commented 1 month ago

Physics parametrization in GEOS are all run at 32-bit precision. The original Translate structure, and it's metric calculation to judge if an error is small enough for the test to pass, were designed with a 64-bit float code.

The metric looks at of order of magnitude normalized to the value. E.g.:

if the value is 1 and we wan't to be precise to 1e-10, an error >1e-10 will be a failure.
if the value is 1e-5, we will need an error >~ 1e-15

The actual calculation is in ndsl/testing/comparison.py

This methods allow a semblance of normalization across different amplitudes but suffer for very small values. If this issue was less pro-eminent at 64-bit it becomes an issue at 32-bit. Even more, the physics deal with very small variation in amplitude, where a single fused-multiply-add could swap results from one architecture to the next.

We need to come up with a series of checks, to distinguish true error from f32 noise.

Parent: https://github.com/GEOS-ESM/SMT-Nebulae/issues/41

[x] Design one or several metrics aimed at strengthening the f32 test cases. Use RadiationCoupling and AerActivation as examples.
[x] Implement OR subtasks (update this DoD)

FlorianDeconinck commented 1 month ago

Relative error at f32 should never be more than e-8/e-9 so we could already swap our threshold depending on the precision.

A better scientific noise/signal check would be to perturbate the inputs on a validating CPU then we would get a real distribution of the potential errors and we could check GPU versus that error.

FlorianDeconinck commented 3 weeks ago

Branch: https://github.com/NOAA-GFDL/NDSL/tree/feature/translate_test_f32

FlorianDeconinck commented 2 weeks ago

PR bringing a multi-modal metric: https://github.com/NOAA-GFDL/NDSL/pull/67

FlorianDeconinck commented 1 day ago

Merged and available via --multimodal_metric when running pytest based translate tests

GEOS-ESM / NDSL

Enhance the "metric" calculation for Translate test at f32 #56