Open ekluzek opened 4 months ago
Still fails for 3 days, which is about the shortest I think we should try...
I talked to @Katetc about this after the CSEG meeting. She also said that the issue is a traditional global-sum issue in MPI which is solved in other places and as such should be relatively easy to fix.
In confirming the timeline on this she sent me an email, which says that they will work on this relatively soon.
On ctsm5.2.005, I'm getting a failure in the same step for
PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode
Should this be marked as an expected fail? I see that a slightly different test (3 days instead of 9) named
PEM_D_Ld3.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode
is present in the expected fail list (and points to this issue), but that's not actually in the test list.
On ctsm5.2.005, I'm getting a failure in the same step for
PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode
Should this be marked as an expected fail? I see that a slightly different test (3 days instead of 9) named
PEM_D_Ld3.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode
is present in the expected fail list (and points to this issue), but that's not actually in the test list.
Yes we should correct the expected fail to the test list. I think @slevis-lmwg did this in 006 though.
I ran into this again in working on ctsm5.2.009 because of a change in the test mod used.
But, I verified that in ctsm5.2.008 the following test fails:
PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel
See this comment: https://github.com/ESCOMP/CTSM/pull/2632#issuecomment-2217988993
Brief summary of bug
With ctsm5.2.0 we discovered we didn't have enough testing that corresponded to CESM or CAM testing. CESM testing is always done with CISM active, so I changed some tests in #2501 from I1850Clm60BgcCrop to I1850Clm60BgcCropG. However,
General bug information
CTSM version you are using: ctsm5.2.004-31-ga09d22376
Does this bug cause significantly incorrect results in the model's science? No
Configurations affected: With CISM active
Details of bug
PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCrop.derecho_intel.clm-clm60cam6LndTuningMode passes, however PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode fails in the comparison of different processors...
FAIL PEM_D_Ld9.ne30pg3_t232.I1850Clm60BgcCropG.derecho_intel.clm-clm60cam6LndTuningMode COMPARE_base_modpes
Important details of your setup / configuration so we can reproduce the bug
In the test list there are PEM and ERP tests for glc* testmods that have a comment that says this
Those tests range from 5 days to 10 days. But, many are f10, and the highest resolution is f19 which runs 5 days.