Closed rgknox closed 5 months ago
Is there an addition or change we can/should make to the CTSM test list to avoid a bug like this in the future? E.g., I first noticed this with FATES_PATCHAREA_AP
—is it not on any output file in the tests?
There are other things to do with cleaning up the history files, but they are out of scope, so I think this PR is feature complete and ready for review. I ran a one-off test and it passsed ERS_Ld10.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars
@samsrabin we periodically need to update the FatesColdAllVars test to make sure it has all of the variables. We have this script: tools/FindInactive.py that trawls the code and updates the list, seems like we need to update it.
regression testing underway
Regression testing on cheyenne against fates-sci.1.76.3_api.35.1.0-ctsm5.2.007
shows that most of the tests are returning B4B or the only diffs are in the fill values. That said, the following tests are return DIFFS:
FAIL ERP_Ld9.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL ERS_D_Ld30.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdPRT2 BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL ERS_D_Ld5.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL ERS_Lm12.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesFireLightningPopDens BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL SMS_Lm1.f45_f45_mg37.I2000Clm60FatesSpCruRsGs.derecho_intel.clm-FatesColdBasic BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
FAIL SMS_Lm3_D_Mmpi-serial.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdHydro BASELINE fates-sci.1.76.3_api.35.1.0-ctsm5.2.007: DIFF
I need to take a look at these to see if they make sense, but there two that jump out at me at first glance. The ERP_Ld9.f45_f45_mg37.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesColdAllVars
and ERS_Lm12.1x1_brazil.I2000Clm50FatesCruRsGs.derecho_intel.clm-FatesFireLightningPopDens
tests returns huge DIFFS in FATES_MORTALITY_PF
due to the new values being huge (i.e. 1e+38
). Spitfire is set to mode 1
in the AllVars
test and tree damage is on as well. Spitfire is set to mode 4
.
The dynamics time-scale complex variables were not being flushed and zero'd correctly, this was corrected, the MORTALITY_PF variable is an example of that
Per our conversation, I went back and looked and realized I misread the cprnc
output. I keep thinking the forth line below is the current commit, but it's actually the baseline:
FATES_MORTALITY_PF (lon,lat,fates_levpft,time) t_index = 2 2
12996 39744 ( 33, 2, 1, 1) ( 23, 21, 10, 1) ( 33, 2, 1, 1) ( 33, 2, 1, 1)
12996 1.225934268493151E-01 2.799785209598797E-03 1.1E+38 1.225934268493151E-01 1.0E+00 1.225934268493151E-01
12996 1.050000000000002E+38 1.050000000000002E+38 1.050000000000002E+38 1.050000000000002E+38
39744 ( 33, 2, 1, 1) ( 33, 2, 1, 1)
avg abs field values: 6.526914797498996E-02 rms diff: 1.0E+38 avg rel diff(npos): 1.0E+00
1.050000000000126E+38 avg decimal digits(ndif): 0.0 worst: 0.0
RMS FATES_MORTALITY_PF 1.0500E+38 NORMALIZED 2.0000E+00
So this is indeed corrected as you explained. I think we can integrate.
For postertity, the test location on derecho: /glade/u/home/glemieux/scratch/ctsm-tests/tests_pr1208-fates
Description:
Some multi-dimensioned history variables were not being reported correctly because they were not being zero'd. These variables were likely incremented over a patch-cohort loop, but being incremented on top of an uninitialized number. This set of changes seeks to make the flushing and zeroing more consistent across all of the different groups of history variables.
This also reduces confusion by moving flushing and zeroing to occur inside the routine that performs the updates. These flushing and zeroing calls still happen redundantly in the host models, should be removed, but are not a liability to generating incorrect results. These calls are just superfluous.
Fixes #1207
Collaborators:
@samsrabin
Expectation of Answer Changes:
Checklist
If this is your first time contributing, please read the CONTRIBUTING document.
All checklist items must be checked to enable merging this pull request:
Contributor
Integrator
Documentation
Test Results:
CTSM (or) E3SM (specify which) test hash-tag:
CTSM (or) E3SM (specify which) baseline hash-tag:
FATES baseline hash-tag:
Test Output: