E3SM-Project / e3sm_diags

E3SM Diagnostics package
https://e3sm-project.github.io/e3sm_diags
BSD 3-Clause "New" or "Revised" License
39 stars 32 forks source link

[Bug]: `tests/integration/test_diags.py` is failing on local `main` branch #758

Closed tomvothecoder closed 9 months ago

tomvothecoder commented 9 months ago

What happened?

6/18 tests in the image diff checker (test_diags.py) fail on a local copy of the main branch. It doesn't happen on GitHub Actions though (example). @forsyth2 mentioned 4 of those are also failing for him his comment.

I've pasted the sections of the logs and image diffs below.

What did you expect to happen? Are there are possible answers you came across?

Tests should pass. Maybe the plots need to be updated on Chrysalis?

Minimal Complete Verifiable Example (MVCE)

No response

Relevant log output

============================== short test summary info ==============================
FAILED tests/integration/test_diags.py::TestAllSets::test_lat_lon_plot_diffs - AssertionError: assert 1 == 0
FAILED tests/integration/test_diags.py::TestAllSets::test_lat_lon_regional_plot_diffs - AssertionError: assert 1 == 0
FAILED tests/integration/test_diags.py::TestAllSets::test_polar_plot_diffs - AssertionError: assert 1 == 0
FAILED tests/integration/test_diags.py::TestAllSets::test_qbo_plot_diffs - AssertionError: assert 1 == 0
FAILED tests/integration/test_diags.py::TestAllSets::test_streamflow_plot_diffs - AssertionError: assert 1 == 0
FAILED tests/integration/test_diags.py::TestAllSets::test_zonal_mean_2d_plot_diffs - AssertionError: assert 1 == 0

Anything else we need to know?

No response

Environment

Latest main branch of e3sm_diags and dev.yml environment

Image Diffs

Difference in Metrics

First image is main, second image is LCRC, third image is the diff.

Lat Lon Regional

ERA-Interim-T-850-ANN-CONUS_RRM_actual ERA-Interim-T-850-ANN-CONUS_RRM_expected ERA-Interim-T-850-ANN-CONUS_RRM_diff

Lat Lon Global

ERA-Interim-T-850-ANN-global_actual ERA-Interim-T-850-ANN-global_expected ERA-Interim-T-850-ANN-global_diff

Polar

ERA-Interim-T-850-ANN-polar_S_actual ERA-Interim-T-850-ANN-polar_S_expected ERA-Interim-T-850-ANN-polar_S_diff

Zonal Mean 2D

ERA-Interim-T-ANN-global_actual ERA-Interim-T-ANN-global_expected ERA-Interim-T-ANN-global_diff

Difference in Test Name

qbo

qbo_diags_actual qbo_diags_expected qbo_diags_diff

streamflow

seasonality_map_actual seasonality_map_expected seasonality_map_diff

tomvothecoder commented 9 months ago

Just an FYI @forsyth2 and @chengzhuzhang. I think the images on Chrysalis need to be updated. Any thoughts?

chengzhuzhang commented 9 months ago

Hi @tomvothecoder Yes, I think you are right. I haven't been diligently to update the expected image on LCRC. As you pointed out the difference is concerning. There are not only plot layout changes, also metrics number difference. For the plots pairs, which is the baseline (expected results), which is from current main branch (upper or bottom plots)? Thanks.

tomvothecoder commented 9 months ago

@chengzhuzhang The first image is main (actual) and second image is LCRC (expected). We can talk about it at our CDAT migration sprint meeting on Monday.

forsyth2 commented 9 months ago

Maybe the plots need to be updated on Chrysalis?

Almost certainly the case. That said, the ultimate solution is really #756.

Before the image diff test was introduced, what we would do is just constantly check the plots randomly to see if they looked alright. That wasn't anywhere near thorough enough, so automating an image diff check seemed like a clever idea.

As we have seen though, the image diff test is extremely sensitive to noise -- thus requiring constant updates that aren't really necessary. Manual checking isn't sensitive enough and the automated check is too sensitive.

It seems like the metrics check will be both faster and exactly the level of sensitivity we want.

(Although as @chengzhuzhang pointed out, it does look like some metrics changed. That could very well be an expected change because of a pull request, one that didn't have the expected files updated after merging.)

chengzhuzhang commented 9 months ago

@chengzhuzhang The first image is main (actual) and second image is LCRC (expected). We can talk about it at our CDAT migration sprint meeting on Monday.

This is actually good news. The first images look more reasonable. Yes, let's walk it through on Monday. Thanks!

tomvothecoder commented 9 months ago
$ cat /lcrc/group/e3sm/public_html/e3sm_diags_test_data/integration/expected/README.md

E3SM Diags version: v2_8_0
Date these files were generated: 20230523
Hash of commit (first 7 characters): ab041b4
tomvothecoder commented 9 months ago

Based on the diff, there were no noticeable code changes that seem to have affected metrics. Since the commit that produced the images on LCRC. We think there might have been an underlying dependency change that affected a few of the plots.

We decided to update the images on LCRC to reflect the latest results produced on main. If there any issues that crop up, we have the backed up images from May.