E3SM-Project / zppy

E3SM post-processing toolchain
BSD 3-Clause "New" or "Revised" License
6 stars 15 forks source link

[Bug]: Errors in 2024-10-25 test of `main` #635

Closed forsyth2 closed 1 month ago

forsyth2 commented 1 month ago

What happened?

See https://github.com/E3SM-Project/zppy/discussions/634#discussioncomment-11056177

Specifically there are at least 3 errors:

  1. cannot stat errors for lnd_monthly_mvm jobs
  2. Image check failures are not being displayed correctly
  3. It looks like there is an error in tc_analysis. This is likely because expected results need to be updated post-https://github.com/E3SM-Project/e3sm_diags/pull/851

What has changed since the 2024-10-18 run? zppy:

e3sm_diags:

What machine were you running on?

Chrysalis

Environment

zppy_dev_weekly_20241025

What command did you run?

zppy -c tests/integration/generated/test_weekly_comprehensive_v3_chrysalis.cfg
zppy -c tests/integration/generated/test_weekly_comprehensive_v2_chrysalis.cfg

Copy your cfg file

# See https://github.com/E3SM-Project/zppy/discussions/634#discussioncomment-11056177

What jobs are failing?

No response

What stack trace are you encountering?

No response

forsyth2 commented 1 month ago
$ cd /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_weekly_20241025/v3.LR.historical_0051/post/scripts
$ grep -v "OK" *status
e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1987-1988_vs_1985-1986.status:ERROR (1)
$ cat e3sm_diags_lnd_monthly_mvm_lnd_model_vs_model_1987-1988_vs_1985-1986.o614397 
cp: cannot stat '/lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_weekly_20241025/v3.LR.historical_0051/post/lnd/180x360_aave/clim/2yr/v3.LR.historical_0051_*_1987??_1988??_climo.nc': No such file or directory
$ ls /lcrc/group/e3sm/ac.forsyth2/zppy_weekly_comprehensive_v3_output/test_zppy_weekly_20241025/v3.LR.historical_0051/post/lnd/180x360_aave/clim/2yr
v3.LR.historical_0051_01_198501_198601_climo.nc  v3.LR.historical_0051_09_198709_198809_climo.nc
v3.LR.historical_0051_01_198701_198801_climo.nc  v3.LR.historical_0051_10_198510_198610_climo.nc
v3.LR.historical_0051_02_198502_198602_climo.nc  v3.LR.historical_0051_10_198710_198810_climo.nc
v3.LR.historical_0051_02_198702_198802_climo.nc  v3.LR.historical_0051_11_198511_198611_climo.nc
v3.LR.historical_0051_03_198503_198603_climo.nc  v3.LR.historical_0051_11_198711_198811_climo.nc
v3.LR.historical_0051_03_198703_198803_climo.nc  v3.LR.historical_0051_12_198512_198612_climo.nc
v3.LR.historical_0051_04_198504_198604_climo.nc  v3.LR.historical_0051_12_198712_198812_climo.nc
v3.LR.historical_0051_04_198704_198804_climo.nc  v3.LR.historical_0051_ANN_198501_198612_climo.nc
v3.LR.historical_0051_05_198505_198605_climo.nc  v3.LR.historical_0051_ANN_198701_198812_climo.nc
v3.LR.historical_0051_05_198705_198805_climo.nc  v3.LR.historical_0051_DJF_198501_198612_climo.nc
v3.LR.historical_0051_06_198506_198606_climo.nc  v3.LR.historical_0051_DJF_198701_198812_climo.nc
v3.LR.historical_0051_06_198706_198806_climo.nc  v3.LR.historical_0051_JJA_198506_198608_climo.nc
v3.LR.historical_0051_07_198507_198607_climo.nc  v3.LR.historical_0051_JJA_198706_198808_climo.nc
v3.LR.historical_0051_07_198707_198807_climo.nc  v3.LR.historical_0051_MAM_198503_198605_climo.nc
v3.LR.historical_0051_08_198508_198608_climo.nc  v3.LR.historical_0051_MAM_198703_198805_climo.nc
v3.LR.historical_0051_08_198708_198808_climo.nc  v3.LR.historical_0051_SON_198509_198611_climo.nc
v3.LR.historical_0051_09_198509_198609_climo.nc  v3.LR.historical_0051_SON_198709_198811_climo.nc

It looks like the files exists. This is exactly the error noted at https://github.com/E3SM-Project/zppy/issues/622#issuecomment-2417599936. That should be resolved by the

    if "lat_lon_land" in c["sets"]:
        check_parameter_defined(c, "climo_land_subsection")
        dependencies.append(
            os.path.join(
                script_dir, f"climo_{c['climo_land_subsection']}{status_suffix}"
            )
        )

block in #633.

forsyth2 commented 1 month ago

After updating the compare_images code, I'm able to generate the image check failures:

cd ~/ez/zppy
conda activate zppy_dev_weekly_20241025
pip install .
python -u -m unittest tests/integration/test_*.py

That gives:

======================================================================
FAIL: test_comprehensive_v2_images (tests.integration.test_weekly.TestWeekly)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/ez/zppy/tests/integration/test_weekly.py", line 36, in test_comprehensive_v2_images
    test_images(self, "comprehensive_v2", V2_CASE_NAME)
  File "/gpfs/fs1/home/ac.forsyth2/ez/zppy/tests/integration/test_weekly.py", line 27, in test_images
    check_mismatched_images(
  File "/gpfs/fs1/home/ac.forsyth2/ez/zppy/tests/integration/utils.py", line 125, in check_mismatched_images
    test.assertEqual(missing_images, [])
AssertionError: Lists differ: ['e3sm_diags/lnd_monthly_mvm_lnd/model_vs_[25440 chars]png'] != []

First list contains 196 additional elements.
First extra element 0:
'e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1982-1983/viewer/viewer/e3sm_logo.png'

Diff is 26080 characters long. Set self.maxDiff to None to see it.

======================================================================
FAIL: test_comprehensive_v3_images (tests.integration.test_weekly.TestWeekly)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/gpfs/fs1/home/ac.forsyth2/ez/zppy/tests/integration/test_weekly.py", line 39, in test_comprehensive_v3_images
    test_images(self, "comprehensive_v3", V3_CASE_NAME)
  File "/gpfs/fs1/home/ac.forsyth2/ez/zppy/tests/integration/test_weekly.py", line 27, in test_images
    check_mismatched_images(
  File "/gpfs/fs1/home/ac.forsyth2/ez/zppy/tests/integration/utils.py", line 125, in check_mismatched_images
    test.assertEqual(missing_images, [])
AssertionError: Lists differ: ['e3sm_diags/lnd_monthly_mvm_lnd/model_vs_[43955 chars]png'] != []

First list contains 336 additional elements.
First extra element 0:
'e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1987-1988/viewer/viewer/e3sm_logo.png'

Diff is 45015 characters long. Set self.maxDiff to None to see it.

----------------------------------------------------------------------
Ran 5 tests in 1093.283s

FAILED (failures=2)

These seem like reasonable changes given https://github.com/E3SM-Project/e3sm_diags/pull/851.

https://web.lcrc.anl.gov/public/e3sm/diagnostic_output/ac.forsyth2/zppy_weekly_comprehensive_v3_www/test_zppy_weekly_20241025/v3.LR.historical_0051/image_check_failures_comprehensive_v3/ is still empty. That is perhaps expected though.

Looking at the bash output of running test_weekly.py: v2:

Missing images:
e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1982-1983/viewer/viewer/e3sm_logo.png
e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1982-1983/lat_lon_land/model_vs_model/v2.LR.historical_0201-QIRRIG_ORIG-MAM-global.png
e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1982-1983/lat_lon_land/model_vs_model/v2.LR.historical_0201-FGEV-ANN-global.png
...

and

Mismatched images:
e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1980-1981/tc_analysis/aew-density-map.png
e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1980-1981/tc_analysis/ace-distribution.png
e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1980-1981/tc_analysis/tc-frequency.png
e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1980-1981/tc_analysis/tc-intensity.png
e3sm_diags/atm_monthly_180x360_aave_mvm/model_vs_model_1980-1981/tc_analysis/tc-frequency-annual-cycle.png
e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1982-1983/tc_analysis/aew-density-map.png
e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1982-1983/tc_analysis/ace-distribution.png
e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1982-1983/tc_analysis/tc-frequency.png
e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1982-1983/tc_analysis/tc-intensity.png
e3sm_diags/atm_monthly_180x360_aave/model_vs_obs_1982-1983/tc_analysis/tc-frequency-annual-cycle.png

v3:

Missing images:
e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1987-1988/viewer/viewer/e3sm_logo.png
e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1987-1988/lat_lon_land/model_vs_model/v3.LR.historical_0051-FIRA-DJF-global.png
e3sm_diags/lnd_monthly_mvm_lnd/model_vs_model_1987-1988/lat_lon_land/model_vs_model/v3.LR.historical_0051-QIRRIG_ORIG-DJF-global.png
...

All of the Missing images are the mvm_lnd error. v3 doesn't have Mismatched images, so it makes sense the image_check_failures directory is empty.