ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
308 stars 312 forks source link

fsurdat file needed for NEON MOAB site #2801

Open samsrabin opened 1 month ago

samsrabin commented 1 month ago

I didn't want to open a whole new issue for this BUT... In #2500 this test changed from FAIL (expected) to PEND in the SHAREDLIB_BUILD phase SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_gnu.clm-NEON-MOAB--clm-PRISM with this error CLMBuildNamelist::add_default() : No default value found for fsurdat.

Same on izumi: SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.izumi_nag.clm-NEON-MOAB--clm-PRISM

Originally posted by @slevis-lmwg in https://github.com/ESCOMP/CTSM/issues/2310#issuecomment-2372367434


I'm elevating this to its own issue because now the cs.status output is confusing, and the expected failure isn't detected.

samsrabin commented 1 month ago

Note that the cs.status output will make more sense—i.e., SETUP will be marked as FAIL—once we bring in cime6.1.27 or later; see https://github.com/ESMCI/cime/pull/4681.

slevis-lmwg commented 3 weeks ago

The file already exists here (78pft) /glade/campaign/cesm/cesmdata/inputdata/lnd/clm2/surfdata_esmf/NEON/ctsm5.3.0/surfdata_1x1_NEON_MOAB_hist_2000_78pfts_c240912.nc and here (16pft) .../16PFT_mixed/surfdata_1x1_NEON_MOAB_hist_2000_16pfts_c240912.nc

samsrabin commented 3 weeks ago

So I guess something just needs to be changed in the XML for the test to pick that up?

slevis-lmwg commented 3 weeks ago

@samsrabin this sounds simple, although @olyson and I looked at this for a few minutes this morning and found: 1) The fsurdat setting seems correct in namelist_defaults_ctsm.xml 2) Other neon tests work suggesting that this test does something different that causes it to break...

slevis-lmwg commented 2 weeks ago

Additional info. This test works: SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_gnu.clm-NEON-MOAB

samsrabin commented 2 weeks ago

Ah, so it seems like the addition of the PRISM testmod is the issue.

slevis-lmwg commented 5 days ago

UPDATE

I reverted the order of /testmods in the test like this: SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_intel.clm-PRISM--clm-NEON-MOAB and the test passed. I will follow up with a test to confirm that I get same answers relative to the original test:

Running the two tests from ctsm5.2.028, i.e. the last tag when the original test passed: Diffs in the lnd_in files suggest that we may see diffs in answers. The runs fail on izumi because they think that cesm.exe does not exist, which it does. If this problem persists, I will repeat these two tests on derecho.

slevis-lmwg commented 4 days ago

The new test works but gives diff answers in ctsm5.2.028 (last tag when the default test still worked) due to diff lnd_in (new test versus default test)

28,29c28,32
<  hist_fincl2 = 'AR', 'ELAI', 'FCEV', 'FCTR', 'FGEV', 'FIRA', 'FSA', 'FSH', 'GPP', 'H2OSOI', 'HR', 'SNOW_DEPTH',
<          'TBOT', 'TSOI', 'SOILC_vr', 'FV', 'NET_NMIN_vr'
---
>  hist_fincl2 = 'TG', 'TBOT', 'FIRE', 'FIRA', 'FLDS', 'FSDS', 'FSR', 'FSA', 'FGEV', 'FSH', 'FGR',
>          'TSOI', 'ERRSOI', 'SABV', 'SABG', 'FSDSVD', 'FSDSND', 'FSDSVI', 'FSDSNI', 'FSRVD', 'FSRND', 'FSRVI',
>          'FSRNI', 'TSA', 'FCTR', 'FCEV', 'QBOT', 'RH2M', 'H2OSOI', 'H2OSNO', 'SOILLIQ', 'SOILICE', 'TSA_U',
>          'TSA_R', 'TREFMNAV_U', 'TREFMNAV_R', 'TREFMXAV_U', 'TREFMXAV_R', 'TG_U', 'TG_R', 'RH2M_U', 'RH2M_R', 'QRUNOFF_U', 'QRUNOFF_R',
>          'SoilAlpha_U', 'SWup', 'LWup', 'URBAN_AC', 'URBAN_HEAT'
114,115c117,118
<  stream_fldfilename_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2016_climo1995-2013.360x720.lnfm_Total_NEONarea_c210625.nc'
<  stream_meshfile_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/ESMF_MESH.Li_2016.360x720.NEONarea_cdf5_c221104.nc'
---
>  stream_fldfilename_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2016_climo1995-2013.360x720.lnfm_Total_c160825.nc'
>  stream_meshfile_lightng = '/glade/campaign/cesm/cesmdata/inputdata/atm/datm7/NASA_LIS/clmforc.Li_2016_climo1995-2013.360x720_ESMFmesh_cdf5_150621.nc'

Next I want to look at code diffs between ctsm5.2.029 and ctsm5.2.028 in case I spot the root cause of the failure.

slevis-lmwg commented 4 days ago

From the code diffs 029 vs. 028, I see three main areas to focus on:

--- a/cime_config/usermods_dirs/NEON/defaults/user_nl_clm
+++ b/cime_config/usermods_dirs/NEON/defaults/user_nl_clm
@@ -18,9 +18,6 @@
 ! Set glc_do_dynglacier  with GLC_TWO_WAY_COUPLING               env variable
 !----------------------------------------------------------------------------------

-flanduse_timeseries = ' '   ! This isn't needed for a non transient case, but will be once we start using transient compsets
-fsurdat = "$DIN_LOC_ROOT/lnd/clm2/surfdata_esmf/NEON/surfdata_1x1_NEON_${NEONSITE}_hist_2000_78pfts_c240206.nc"
-
 ! h1 output stream
slevis-lmwg commented 4 days ago

Putting back the code shown in the last post fixes the test failure. But it also reverses an attempt to reduce code clutter. Is there an alternative solution? Is the /testmods order-reversal -- that I showed works -- an acceptable solution?

samsrabin commented 4 days ago

I think the root issue is that the NEON site defaults only apply if simulating 2018:

<!-- for NEON sites present day simulations - year 2000 -->
<fsurdat hgrid="CLM_USRDAT" neon=".true." sim_year="2018" use_fates=".true.">
lnd/clm2/surfdata_esmf/NEON/ctsm5.3.0/16PFT_mixed/surfdata_1x1_NEON_${NEONSITE}_hist_2000_16pfts_c240912.nc</fsurdat>
<fsurdat hgrid="CLM_USRDAT" neon=".true." sim_year="2018" use_fates=".false.">
lnd/clm2/surfdata_esmf/NEON/ctsm5.3.0/surfdata_1x1_NEON_${NEONSITE}_hist_2000_78pfts_c240912.nc</fsurdat>

Is there a reason for that?

samsrabin commented 4 days ago

Or another way of looking at it: The issue is that adding the PRISM testmod after the NEON one means that the NEON testmod's shell_commands seemingly never get run. Otherwise, the date would be set to 2018.

But that raises another question: When you run with PRISM first, does the PRISM testmod's shell_commands get run?

samsrabin commented 4 days ago

Never mind, that's not it. Both orderings result in the following output for ./xmlquery -p YR:

Results in group run_component_datm
    DATM_YR_ALIGN: 2018
    DATM_YR_END: 2020
    DATM_YR_START: 2018
    DATM_YR_START_FILENAME: 9999

And the following for ./xmlquery --listall | grep 2018:

    CLM_NML_USE_CASE: 2018_control
    DATM_YR_ALIGN: 2018
    DATM_YR_START: 2018

But I have to say, I don't like not knowing why the order matters...

samsrabin commented 4 days ago

Found it! The problem is that CLMBuildNamelist.pm doesn't set neon to .true. unless CLM_USRDAT_NAME is NEON. When the PRISM testmod comes second, CLM_USRDAT_NAME is set to NEON.PRISM. The following change fixes it:

--- a/bld/CLMBuildNamelist.pm
+++ b/bld/CLMBuildNamelist.pm
@@ -713,7 +713,7 @@ sub setup_cmdl_resolution {
   $nl_flags->{'neon'} = ".false.";
   $nl_flags->{'neonsite'} = "";
   if ( $nl_flags->{'res'} eq "CLM_USRDAT" ) {
-    if ( $opts->{'clm_usr_name'} eq "NEON" ) {
+    if ( $opts->{'clm_usr_name'} eq "NEON" || $opts->{'clm_usr_name'} eq "NEON.PRISM" ) {
        $nl_flags->{'neon'} = ".true.";
        $nl_flags->{'neonsite'} = $envxml_ref->{'NEONSITE'};
        $log->verbose_message( "This is a NEON site with NEONSITE = " . $nl_flags->{'neonsite'} );

However, there's probably a better way to do this with Perl—e.g., instead of checking for exact matches, just check whether the name starts with NEON.

samsrabin commented 4 days ago

Yep, like so:

--- a/bld/CLMBuildNamelist.pm
+++ b/bld/CLMBuildNamelist.pm
@@ -678,6 +678,11 @@ sub setup_cmdl_chk_res {
   }
 }

+sub begins_with
+{
+    return substr($_[0], 0, length($_[1])) eq $_[1];
+}
+
 sub setup_cmdl_resolution {
   my ($opts, $nl_flags, $definition, $defaults, $envxml_ref) = @_;

@@ -713,7 +718,7 @@ sub setup_cmdl_resolution {
   $nl_flags->{'neon'} = ".false.";
   $nl_flags->{'neonsite'} = "";
   if ( $nl_flags->{'res'} eq "CLM_USRDAT" ) {
-    if ( $opts->{'clm_usr_name'} eq "NEON" ) {
+    if ( begins_with($opts->{'clm_usr_name'}, "NEON") ) {
        $nl_flags->{'neon'} = ".true.";
        $nl_flags->{'neonsite'} = $envxml_ref->{'NEONSITE'};
        $log->verbose_message( "This is a NEON site with NEONSITE = " . $nl_flags->{'neonsite'} );
slevis-lmwg commented 4 days ago

Thank you @samsrabin I'm testing your suggestion now.

slevis-lmwg commented 4 days ago

./create_test SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm60Bgc.derecho_intel.clm-NEON-MOAB--clm-PRISM worked on derecho, so I will open a PR with your suggested mods.