ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
302 stars 307 forks source link

Failing water isotope test on the ctsm5.2 branch #2444

Closed ekluzek closed 2 months ago

ekluzek commented 5 months ago

Brief summary of bug

As seen here... https://github.com/ESCOMP/CTSM/pull/2427#issuecomment-2026011205

The test SMS_D_Ld10.f10_f10_mg37.I2000Clm50BgcCrop.izumi_intel.clm-tracer_consistency fails in what will be alpha-ctsm5.2.mksrf.25_ctsm5.1.dev175 as well as alpha-ctsm5.2.mksrf.23_ctsm5.1.dev171 but passed in alpha-ctsm5.2.mksrf.22_ctsm5.1.dev168.

General bug information

CTSM version you are using: alpha-ctsm5.2.mksrf.23_ctsm5.1.dev171

Does this bug cause significantly incorrect results in the model's science? [No

Configurations affected: with enable_water_tracer_consistency_checks = .true.

Important output or errors that show the problem

cesm.log:

This routine is depricated - use shr_log_setLogUnit instead        -134
 shr_file_mod.F90         912 
 This routine is depricated - use shr_log_setLogUnit instead        -132
 shr_file_mod.F90         912 
 This routine is depricated - use shr_log_setLogUnit instead        -134
ERROR in CompareBulkToTracer: tracer does not agree with bulk water
Called from: after first stage of hydrology
Variable: snocan_patch
First difference at index: 24
Bulk  :   0.40122337938811679-295
Tracer:   0.00000000000000000E+00
ratio:   0.10000000000000000E-09
Bulk*ratio:   0.40122337938811680-305
iam = 20: local  patch    index = 24
iam = 20: global patch    index = 2947
iam = 20: global column   index = 2211
iam = 20: global landunit index = 470
iam = 20: global gridcell index = 165
iam = 20: gridcell longitude    =   90.0000000
iam = 20: gridcell latitude     =   40.0000000
iam = 20: pft      type         = 8
iam = 20: column   type         = 1
iam = 20: landunit type         = 1
 ENDRUN:
 ERROR: CompareBulkToTracer: tracer does not agree with bulk water
Image              PC                Routine            Line        Source             
cesm.exe           0000000004AE6556  Unknown               Unknown  Unknown
cesm.exe           000000000421343A  shr_abort_mod_mp_         114  shr_abort_mod.F90
cesm.exe           00000000042132A0  shr_abort_mod_mp_          61  shr_abort_mod.F90
cesm.exe           0000000000A3C63F  abortutils_mp_end          98  abortutils.F90
cesm.exe           00000000035962A7  watertracerutils_         401  WaterTracerUtils.F90
cesm.exe           00000000035B7728  watertype_mp_trac         953  WaterType.F90
cesm.exe           0000000000A5AAE0  clm_driver_mp_clm         588  clm_driver.F90
cesm.exe           00000000009915B2  lnd_comp_nuopc_mp         904  lnd_comp_nuopc.F90
libesmf.so         00007F242D0B3A7E  _ZNK5ESMCI13Metho         377  ESMCI_MethodTable.C
libesmf.so         00007F242D0B4EE2  _ZN5ESMCI11Method         563  ESMCI_MethodTable.C
libesmf.so         00007F242D0B3443  c_esmc_methodtabl         317  ESMCI_MethodTable.C
wwieder commented 5 months ago

is this an important test if water isotopes don't work in CTSM?

ekluzek commented 5 months ago

@wwieder good question. I'm not sure.

Keeping these tests in place does keep the parts of water isotopes working that is in place. Which should help if someone can get back to water isotopes. And we haven't had these tests fail that often, so it hasn't been onerous.

I am going to leave it alone for a bit and move forward with other things. And I'm figuring out where it breaks which might lead to an easy solution. And a reasonable thing might be to bring in ctsm5.2.0 even with this one failing thing if there isn't an easy fix.

ekluzek commented 5 months ago

I tracked this back based on tags and it works in alpha-ctsm5.2.mksrf.23_ctsm5.1.dev168 bucould t fails in the next tag we made on ctsm5.2 of alpha-ctsm5.2.mksrf.23_ctsm5.1.dev171. These means it fails somewhere in the update between ctsm5.1.dev168 and ctsm5.1.dev171. In tracking that I find that it works in ctsm5.1.dev170. So the cold-start change in ctsm5.1.dev170 in PR #2355 seems to be the culprit here.

This test is a cold-start test so that could be, but the change is just a small change in initial Temperatures which shouldn't matter for water isotopes.. And yet it somehow does...

ekluzek commented 5 months ago

I verified that doing the following simple change in #2417 gets this test to pass.

diff --git a/src/biogeophys/TemperatureType.F90 b/src/biogeophys/TemperatureType.F90
index ab310650c..6450f3f31 100644
--- a/src/biogeophys/TemperatureType.F90
+++ b/src/biogeophys/TemperatureType.F90
@@ -741,7 +741,7 @@ subroutine InitCold(this, bounds, &
                   end if
                end if
             else
-               this%t_soisno_col(c,1:nlevgrnd) = 272._r8
+               this%t_soisno_col(c,1:nlevgrnd) = 274._r8
                if (use_excess_ice .and. (lun%itype(l) == istsoil .or. lun%itype(l) == istcrop)) then
                   this%t_soisno_col(c,1:nlevgrnd) = SHR_CONST_TKFRZ - 5.0_r8 !needs to be below freezing to properly initiate excess ice
                end if
wwieder commented 5 months ago

@ekluzek is suggesting this an expected fail and will talk to @billsacks to see if it's something we need to modify the test case for? @wwieder assumes this is caused by fractionation caused by freezing water (somehow).

slevis-lmwg commented 5 months ago

This issue reminds me of two issues that I worked on in the last few months; they may be related to this:

2041

2366

billsacks commented 5 months ago

Thanks @ekluzek for the detailed notes here along with all of your investigations so far! Thanks also @slevis-lmwg for the pointers to similar issues, and @wwieder for the good questions. I'm happy to look into this a bit. I have a couple of ideas of things that might fix this that I want to try initially.

Regarding @wwieder 's question:

is this an important test if water isotopes don't work in CTSM?

I have two answers to that:

(1) The water isotope tests help ensure that the ~ 6 months of work that got water tracers / isotopes partially working don't get undone. From that perspective, these tests could be dropped if/when the feeling becomes that it is very unlikely that the water tracer / isotope work will ever be finished... though there's a part of me that would be tempted to explore re-shifting my priorities to work on this for a year and try to get it wrapped up if it came to that (if this had enough support to do so).

(2) Most – maybe all? – recent failures in isotope tests, including this one, are "canary in the coal mine" sorts of failures. That is, even though the failures are in water isotope tests, they're really indicative of bigger problems in the model that have nothing to do with water isotopes (in my view). And this is really tied in with why implementing water tracers in CTSM is hard in the first place: CTSM's hydrology is not careful / rigorous about its numerics. Implementing water tracers in a maintainable way requires more rigor with the hydrology numerics. Many (all?) of the problems we've been running into have been related to state variables getting updated in a way that should bring them to exactly zero but in fact they are ending up roundoff-level different from zero. That seems to be the situation here. Without the tracer/isotope-related checks, the model continues to run, but that doesn't necessarily mean that it's running correctly. Here, for example, you have a snocan value of 0.4 x 10^-295. That is effectively zero, and the model should have set that to exactly zero. If we have any code like if (snocan > 0) then X else Y then this code currently isn't working right under conditions like this. So under this perspective, my feeling is that we should be thankful that these tests are catching issues with CTSM's hydrology numerics that may be causing incorrect operation of the model, so we shouldn't be quick to throw them out.

This test is a cold-start test so that could be, but the change is just a small change in initial Temperatures which shouldn't matter for water isotopes.. And yet it somehow does...

That is indeed weird. My guess is that this is just triggering a different evolution of the system that makes this issue appear when (by chance) it didn't appear before, but maybe there's something weirder going on here.

@wwieder assumes this is caused by fractionation caused by freezing water (somehow).

Scientifically I see how that could make sense, but so far we don't have any fractionation implemented, so that wouldn't explain this.

billsacks commented 5 months ago

I have resolved this in #2457 . The change is to adjust the tolerance for the truncation that @slevis-lmwg introduced in https://github.com/ESCOMP/CTSM/pull/2053. Note that this will probably change answers, maybe for many tests. (Note that ctsm5.1.dev141, which was the tag associated with #2053, had answer changes, so I expect this adjustment to have similar answer changes.)

wwieder commented 5 months ago

Thanks for engaging here, Bill. Much appreciated

billsacks commented 5 months ago

Thanks for engaging here, Bill. Much appreciated

Happy to help. As I said to Erik, I feel invested in keeping the water tracer stuff as working as possible so that one day in the uncertain future it could all come together....