NCAR / DART

Data Assimilation Research Testbed
https://dart.ucar.edu/
Apache License 2.0
196 stars 145 forks source link

bug: When the setup scripts for cam-fv are run it is possible to attempt modulo of a string into the assimilate.csh script #435

Closed johnsonbk closed 1 year ago

johnsonbk commented 1 year ago

Describe the bug

The setup scripts for cam-fv create an assimilate.csh script that is invoked by CESM. The assimilate.csh script is created in three steps: variables set in the setup script are sed inserted into DART_config.template, DART_config.template is renamed DART_config, and DART_config is run to configure assimilate.csh.template and rename it assimilate.csh.

  1. List the steps someone needs to take to reproduce the bug. In the cam-fv setup scripts, an environmental variable, save_every_Mth_day_restart, is set. Typically this variable is set to an integer corresponding to the number of days between restart file saves. However, it is possible to set this variable to a string corresponding the day of the week that restart files are saved such as: setenv save_every_Mth_day_restart Monday.
  2. What was the expected outcome? When the setup script runs and an integer is set, setenv save_every_Mth_day_restart 7 the user expects assimilate.csh to have an if statement resembling something such as:
    369 if ( $sec_o_day !~ '00000' || \
    370    ($sec_o_day =~ '00000' && $day_o_month % 7 != 0) ) then
  3. What actually happened? However, when the setup script runs and a string day of the week is set, setenv save_every_Mth_day_restart Monday the resulting assimilate.csh that gets created will attempt to modulo a string:
    369 if ( $sec_o_day !~ '00000' || \
    370    ($sec_o_day =~ '00000' && $day_o_month % Monday != 0) ) then

Error Message

If the condition described in answer 3 above occurs, an error message gets thrown in assimilate.csh on a subsequent invocation of the script:

327 if ($#log_list >= 3) then
328 
329    # List of potential restart sets to remove. The coupler restart files
330    # may or may not have an 'instance' string in them, depending on whether
331    # or not you are using the multi-driver or not, so we must check for both.
332 
333    set re_list = `${LIST} -t *cpl.r.*`
334    if ($#re_list == 0) set re_list = `${LIST} -t *cpl_0001.r.*`
335 
336    if ($#re_list < 3) then
337       echo "ERROR: Too many cesm.log files ($#log_list) for the $#re_list restart sets."
338       echo "       Clean out the cesm.log files from failed cycles."
339       exit 50
340    endif

The errors in lines 337 and 338 get thrown. Note that these error statements occur earlier in the script than the bug that causes them to get thrown.

Which model(s) are you working with?

cam-fv

Screenshots

See code snippets above.

Version of DART

Which version of DART are you using? v10.5.5

Have you modified the DART code?

No

Build information

Please describe:

  1. Machine I am using: Shaheen and Cheyenne
  2. Compiler: This issue is compiler independent issue but in this case I'm using intel.

Proposed solution

Replace the error prone lines above using alternative logic from the assimilate.csh.template that is used for the CAM6+DART reanalysis which avoids this problem:

368   # Decide whether to purge restart files at this date and time.
369   set save_rest_freq = BOGUS_save_every_Mth
370   
371   set purge = 'true'
372   # Learn whether save_rest_freq a string or a number.
373   # Character strings must be tested outside of the 'if' statement.
374   echo $save_rest_freq | grep '[a-z]'
375   if ($status == 0) then
376      set purge_date = $RM_DATE_PARTS[1]-$RM_DATE_PARTS[2]-$RM_DATE_PARTS[3]
377      set weekday = `date --date="$purge_date" +%A`
378      if ($weekday == $save_rest_freq) set purge = 'false'
379
380   # Numbers can be tested inside the 'if' statement.
381   else if (`echo $save_rest_freq | grep '[0-9]'`) then
382      if (${day_o_month} % ${save_rest_freq} == 0) set purge = 'false'
383
384   endif
[ ... ]
397   if ( $sec_o_day !~ '00000' || \
398    ($sec_o_day =~ '00000' && $purge == 'true') ) then
hkershaw-brown commented 1 year ago

I'm asking this so it is a bit clearer what to test for this change to go into the main branch.

Is the assimilate.csh.template used for the reanalysis from the DART_CASES repo: https://github.com/NCAR/DART_CASES/blob/f.e21.FHIST_BGC.f09_025.CAM6assim.011/assimilate.csh.template (Latest commit 6761a76 on Feb 11, 2020)

or the assimilate.csh.template from the reanalysis branch: https://github.com/NCAR/DART/blob/reanalysis/models/cam-fv/shell_scripts/cesm2_1/assimilate.csh.template (Latest commit ab14aed on May 27 2022)

These scripts are different.

Followup question, do we need to be concerned about what is or is not used for the reanalysis? Can we just fix and test the proposed solution on the main branch assimilate.csh.template? Which I am assuming is this one: https://github.com/NCAR/DART/blob/main/models/cam-fv/shell_scripts/cesm2_1/assimilate.csh.template (Latest commit 1a7e1b9 on Mar 26, 2019)

johnsonbk commented 1 year ago

I created PR #437 to illustrate how the script can be updated to fix the issue.

I don't know which assimilate.csh.template is used for the reanalysis. The DART_CASES repository was created to archive the configurations of important experiments such as the reanalysis. Perhaps @kdraeder can explain further.

For the sake of simplifying the discussion, lines 370-400 in the assimilate.csh.template of the reanalysis branch of the DART repository match lines 368-398 in the assimilate.csh.template of the f.e21.FHIST_BGC.f09_025.CAM6assim.011 branch of the DART_CASES repository.

For your follow up question: I don't know how closely the scripts in the DART repository need to match the scripts used in the reanalysis. For the purpose of this pull request, I think we should just fix and test the assimilate.csh.template of the DART main branch. Perhaps others will have differing opinions though.

kdraeder commented 1 year ago

The version in the DART_CASES repo is a record of what was used in the Reanalysis in /glade/work/raeder/Exp/f.e21.FHIST_BGC.f09_025.CAM6assim.011 The DART_CASES sometimes lags for trivial changes, but important changes are archived. That case does not use the main or reanalysis branch version of assimilate.csh.template; that was only done years ago when .../Exp/setup_advanced_Rean.original was run.

The version in the NCAR/DART reanalysis branch may lag behind both of those versions, depending on how diligent I've been about updating it. The philosophy was that it should be updated only when permanent changes are needed, while the versions in Exp and DART_CASES may have more frequent changes (and reversions) to deal with transient problems that arise during the Reanalysis assimilation.

This fix should be committed to the main branch.