ESCOMP / CTSM

Community Terrestrial Systems Model (includes the Community Land Model of CESM)
http://www.cesm.ucar.edu/models/cesm2.0/land/
Other
308 stars 312 forks source link

shell_commands for tests with two testmods listed don't concatenate both together.. #2037

Open ekluzek opened 1 year ago

ekluzek commented 1 year ago

Brief summary of bug

The shell_commands file created for a test with two testmods such as...

SMS_Ld10_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Fates.izumi_nag.clm-NEON-FATES-SJER--clm-FatesPRISM

seems to execute the first test-mod, but create the shell_commands for the testcase that just contains the second.

General bug information

CTSM version you are using: ctsm5.1.dev129

Does this bug cause significantly incorrect results in the model's science? No

Configurations affected: Tests in the test list that include two testmods

This would be an issue in cime, rather than in CTSM. And possibly this behavior is OK, but it makes it hard to see what the test is actually doing.

ekluzek commented 1 year ago

I also note that the order for the testmods FATES NEON tests is the opposite to the BGC NEON tests. This should be figured out...

wwieder commented 1 year ago

Could one solution to this and #1949 be to use usermods in NEON/defaultshell_commands to key off of the compset being used and customize FATES namelist setting in? For example:

if [[ $compset =~ *Fates* ]]; then
    echo "fsurdat ='$DIN_LOC_ROOT/lnd/clm2/surfdata_map/NEON/16PFT_mixed/surfdata_1x1_NEON_${NEONSITE}_hist_16pfts_Irrig_CMIP6_simyr2000_c230120.nc'" >> user_nl_clm
    echo "hist_fincl2 = 'LIST','OF','FATES','HISTORY','FILES' " >> user_nl_clm 
fi
ekluzek commented 1 year ago

@wwieder I like that suggestion for duplication. This issue here is pretty specific to how testmods work for us. And I probably need to file a specific issue with cime, but I need to investigate more.

wwieder commented 1 year ago

Should this come in with #2044?

billsacks commented 1 year ago

@ekluzek - I see the value in what you're saying: having a full list of all shell_commands for a test. This is more of a concern now that (1) we allow multiple testmods and (2) the shell_commands can be pretty complex (if they were simply a list of xmlchange commands, you could see what was done in the CaseStatus file).

I think you're going to run into trouble if you try to append to the shell_commands without removing it at the start of the application of each testmod (https://github.com/ESMCI/cime/blob/95e29f9736d2ef41b690f082afe8509caadecb08/CIME/user_mod_support.py#L29-L31), because you'll end up re-executing commands multiple times (which may often be okay, but in some cases may be problematic). My suggestion would be to introduce a new file like shell_commands_all_testmods into which you copy the contents of shell_commands after executing them (e.g., after this block: https://github.com/ESMCI/cime/blob/95e29f9736d2ef41b690f082afe8509caadecb08/CIME/user_mod_support.py#L118-L121)

samsrabin commented 1 year ago

Just a note: As part of #1863, I had bumped the CIME version from cime6.0.108 to cime6.0.118. It turns out that the 5 tests like this—with multiple testmods listed—crash in the XML generation step under the latter. I'm not sure whether this is resolved in one of the subsequent CIME tags, but on a quick scan none of their commit messages suggest they'd have an effect.

samsrabin commented 1 year ago

Hmm. I just rolled back to cime6.0.108 and I'm still getting the crashes. This happens with my code as well as with ctsm5.1.dev130.

Here's the crash log, from /scratch/cluster/samrabin/tests_0712-100716iz/SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO.C.0712-100716iz on Izumi:

2023-07-12 10:07:39: Test 'SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO' failed in phase 'XML' with exception ''NoneType' object has no attribute 'group''
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 1093, in _run_catch_exceptions
    return run(test)
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 885, in _xml_phase
    opt = match.group(1)

 ---------------------------------------------------
2023-07-12 10:07:41: Test 'SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO' failed in phase 'XML' with exception 'ERROR: locked: cannot make child group in file /scratch/cluster/samrabin/tests_0712-100716iz/SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO.C.0712-100716iz/env_test.xml'
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 1093, in _run_catch_exceptions
    return run(test)
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 790, in _xml_phase
    envtest.add_elements_by_group(files, {}, "env_test.xml")
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/entry_id.py", line 448, in add_elements_by_group
    newgroup = self.make_child(name="group", attributes={"id": gname})
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/generic_xml.py", line 311, in make_child
    "read_only" if self.read_only else "locked", name, self.filename
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/utils.py", line 166, in expect
    raise exc_type(msg)

 ---------------------------------------------------
2023-07-12 10:07:43: Test 'SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO' failed in phase 'XML' with exception 'ERROR: locked: cannot make child group in file /scratch/cluster/samrabin/tests_0712-100716iz/SMS_L10d_D_Mmpi-serial.CLM_USRDAT.I1PtClm51Bgc.izumi_nag.clm-default--clm-NEON-NIWO.C.0712-100716iz/env_test.xml'
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 1093, in _run_catch_exceptions
    return run(test)
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/test_scheduler.py", line 790, in _xml_phase
    envtest.add_elements_by_group(files, {}, "env_test.xml")
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/entry_id.py", line 448, in add_elements_by_group
    newgroup = self.make_child(name="group", attributes={"id": gname})
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/XML/generic_xml.py", line 311, in make_child
    "read_only" if self.read_only else "locked", name, self.filename
  File "/home/samrabin/ctsm_rx_crop_calendars4/cime/CIME/utils.py", line 166, in expect
    raise exc_type(msg)
samsrabin commented 1 year ago

Never mind! Apparently @glemieux fixed this already.

ekluzek commented 2 months ago

Closing this as a wontfix. This would need to be addressed in cime.

samsrabin commented 2 months ago

@ekluzek This is an important issue, though. Could you post it in the CIME repo?