Dimensional scaling tests are producing chksum differences

mnlevy1981 commented 2 months ago

I got the MARBL branch to pass dimensional scaling tests, but in doing so I noticed that some of the scaling tests are producing chksum() differences for non-MARBL fields. My testing strategy was to run a baseline with DEBUG=True, and then run individual tests with DEBUG=True and one of the *_RESCALE_POWER=10. The ocean.stats files matched for all these runs, but some of cesm.log files reported differences in the log output:

T_RESCALE_POWER = 10:

@@ -606,8 +606,8 @@
 h-point: c=    194999 after KPP tv%frazil
 h-point: mean=   0.0000000000000000E+00 min=   0.0000000000000000E+00 max=   0.0000000000000000E+00 after KPP tv%salt_deficit
 h-point: c=         0 after KPP tv%salt_deficit
-h-point: mean=   4.4376542679371360E-05 min=  -6.4022985345054831E-03 max=   3.9322714284468409E-02 after KPP tv%TempxPmE
-h-point: c=   5041265 after KPP tv%TempxPmE
+h-point: mean=   4.5441579703676273E-02 min=  -6.5559536993336147E+00 max=   4.0266459427295651E+01 after KPP tv%TempxPmE
+h-point: c=   5071421 after KPP tv%TempxPmE
 h-point: mean=   6.6078014529134853E-04 min=   0.0000000000000000E+00 max=   3.0173242267400169E-01 after KPP Kd_heat
 h-point: c= 409916705 after KPP Kd_heat
 h-point: mean=   6.6058804378353185E-04 min=   0.0000000000000000E+00 max=   3.0173242267400169E-01 after KPP Kd_salt

L_RESCALE_POWER = 10

@@ -42499,9 +42499,9 @@
 h-point: mean=   2.4352442695468348E+04 min=   1.7138516846630746-143 max=   6.9861061588048920E+04 MEKE LmixScale
 h-point: c=   5490514 MEKE LmixScale
 h-point: mean=   3.4859618873247925E-12 min=  -2.8924527398703980E-07 max=   2.6980998412243935E-07 MEKE src
-h-point: c=   5031269 MEKE src
+h-point: c=   5031240 MEKE src
 h-point: mean=   1.1141024840786339E-02 min=   0.0000000000000000E+00 max=   4.5810243321258808E+00 MEKE post-update MEKE
-h-point: c=   5132447 MEKE post-update MEKE
+h-point: c=   5132449 MEKE post-update MEKE
 h-point: mean=   4.3433588469983327E+01 min=   2.3507772963993505E-04 max=   2.5543838125970015E+02 Pre-advection h
 h-point: c= 359916601 sw= 360000603 se= 360000603 nw= 359832599 ne= 359832599 Pre-advection h
 u-point: mean=  -6.2482021116265647E+07 min=  -1.7043868928096725E+10 max=   1.0955338455112007E+10 u Pre-advection uhtr

C_RESCALE_POWER = 10

@@ -8406,10 +8406,10 @@
 h-point: c= 393247078 Before tracer diffusion coccoFe
 h-point: mean=   6.3839077495849300E-03 min=   9.9958796553452880-101 max=   2.7936481413543470E+00 Before tracer diffusion coccoCaCO3
 h-point: c= 396183690 Before tracer diffusion coccoCaCO3
-h-point: mean=   5.9419608913172093E+00 min=  -2.1183601956327172E+00 max=   3.2145597947819027E+01 before HBD temp
-h-point: c= 306692919 before HBD temp
-h-point: mean=   5.9419608946613094E+00 min=  -2.1183601956327172E+00 max=   3.2145507047488380E+01 after HBD temp
-h-point: c= 306689751 after HBD temp
+h-point: mean=   5.8026961829269622E-03 min=  -2.0687111285475753E-03 max=   3.1392185495917019E-02 before HBD temp
+h-point: c= 333613467 before HBD temp
+h-point: mean=   5.8026961861926850E-03 min=  -2.0687111285475753E-03 max=   3.1392096726062871E-02 after HBD temp
+h-point: c= 333610299 after HBD temp
 h-point: mean=   2.8279382711228873E+01 min=   0.0000000000000000E+00 max=   4.0748130640758944E+01 before HBD salt
 h-point: c= 266237499 before HBD salt
 h-point: mean=   2.8279382712458503E+01 min=   0.0000000000000000E+00 max=   4.0748130640758944E+01 after HBD salt

S_RESCALE_POWER = 10

@@ -8410,10 +8410,10 @@
 h-point: c= 306692919 before HBD temp
 h-point: mean=   5.9419608946613094E+00 min=  -2.1183601956327172E+00 max=   3.2145507047488380E+01 after HBD temp
 h-point: c= 306689751 after HBD temp
-h-point: mean=   2.8279382711228873E+01 min=   0.0000000000000000E+00 max=   4.0748130640758944E+01 before HBD salt
-h-point: c= 266237499 before HBD salt
-h-point: mean=   2.8279382712458503E+01 min=   0.0000000000000000E+00 max=   4.0748130640758944E+01 after HBD salt
-h-point: c= 266239376 after HBD salt
+h-point: mean=   2.7616584678934446E-02 min=   0.0000000000000000E+00 max=   3.9793096328866157E-02 before HBD salt
+h-point: c= 325452834 before HBD salt
+h-point: mean=   2.7616584680135257E-02 min=   0.0000000000000000E+00 max=   3.9793096328866157E-02 after HBD salt
+h-point: c= 325454711 after HBD salt
 h-point: mean=  -1.8060261735762620E-01 min=  -1.0000000000000000E+00 max=   1.1415525114155259E-04 before HBD age
 h-point: c= 343965535 before HBD age
 h-point: mean=  -1.8060261735762584E-01 min=  -1.0000000000000000E+00 max=   1.1415581285044829E-04 after HBD age

Z_RESCALE_POWER = 10: no diffs in log output
H_RESCALE_POWER = 10: no diffs in log output

I did not see these differences when running single-column MARBL tests using solo_driver, but it's not clear whether that means the missing scaling is in the NUOPC cap or if it has to do with the different parameterizations we are using. Given the bit-for-bit history file output, though, it does seem likely that the problem is a missing scale argument in the hchksum() call rather than an actual scaling issue (although the MEKE diff not showing up until the checksum is confusing)

mnlevy1981 commented 2 months ago

Neither Q_RESCALE_POWER = 10 nor R_RESCALE_POWER = 10 show log differences, so it's just T, L, C, and S.

Hallberg-NOAA commented 2 months ago

There is a new pull request (https://github.com/NOAA-GFDL/MOM6/pull/620, that is headed to main via dev/gfdl) that might partially address this issue. It corrects some bugs (including dimensional rescaling factors involving Z and L) in the calculation of the MEKE source terms in a form that I believe is being used at NCAR.

There is a second recent pull request (https://github.com/NOAA-GFDL/MOM6/pull/601, also headed to main via dev/gfdl) that adds comments noting several dimensional inconsistencies within vertFPmix() and also some checksums that are missing the appropriate scale arguments. This PR notes the problems but does not correct them because we are not using this subroutine at GFDL, and we wanted to leave it to NCAR to decide how best to correct these issues (e.g., with a ..._BUG flag or just fix them) with minimal disruptions to your ongoing MOM6 simulations.

I suspect that these two pull requests might go a long way toward addressing this issue.

mnlevy1981 commented 2 months ago

Thanks @Hallberg-NOAA, this is very helpful! Since these issues are only in the chksum output and don't affect ocean.stats or what is written to history files, the CESM-based dimensional scaling tests all pass. I think this means (1) we're happy to fix things inline without a _BUG flag, and (2) this is a fairly low priority, so I'm happy to wait for a few updates to trickle onto dev/ncar from dev/gfdl before (eventually) fixing issues noted in https://github.com/NOAA-GFDL/MOM6/pull/601

NCAR / MOM6

Dimensional scaling tests are producing chksum differences #275