E3SM-Project / e3sm_diags

E3SM Diagnostics package
https://e3sm-project.github.io/e3sm_diags
BSD 3-Clause "New" or "Revised" License
39 stars 32 forks source link

[Feature]: Replace image diff checking in integration tests with metrics checking instead #756

Open tomvothecoder opened 9 months ago

tomvothecoder commented 9 months ago

Is your feature request related to a problem?

Currently, tests/integration/test_diags.py runs the all_sets.cfg diagnostics and takes the diffs of the results and compares them against a baseline (whatever is on Chrysalis). We set the minimum diff threshold of non-zero pixels to 2%. The issue with taking a diff of two images is that any noise can break the test (e.g., change in matplotlib formatting, shifting of legend, floating point formatting, different font sizes). The baseline results sometimes need to be updated if matplotlib updates introduce side-effects. It is challenging to debug the integration tests and they take a long time to run (#643), which bogs down development.

For example, below is the actual, expected, and the difference of both. Notice that the diff is basically just noise from the legend shifting over a bit and a change in the "Test" name.

feedback-TREFHT-NINO3-TS-NINO3_actual feedback-TREFHT-NINO3-TS-NINO3_expected feedback-TREFHT-NINO3-TS-NINO3_diff

Describe the solution you'd like

We should compare the underlying metrics in the .json files instead. Users should manually validate the plots are as expected based on the metrics being plotted since that is a more reliable over pixel comparisons.

Describe alternatives you've considered

No response

Additional context

No response

forsyth2 commented 9 months ago

Thanks @tomvothecoder I agree this would a more reliable test. I suppose zppy could do the same.