Closed ekluzek closed 1 year ago
@ekluzek please provide an example
This must be for cesm only, because we use this workflow (-o) all the time in E3SM.
I used to use "-o" all the time, but no longer due because of concern over issues like this.
The example I have now is that I updated the clm4_5 namelist after the original test run of /glade/work/erik/ctsm_firefix/. After rerunning it didn't recopy the lnd_in file over...
So for example this test SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default after being rerun didn't update the lnd_in file in the generated. It DOES look like it updated the history files so that's good.
-rw-rw-r-- 1 erik cgdtss 8630 Sep 4 13:43 /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev063/SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default/CaseDocs/lnd_in -rw-r--r-- 1 erik cgdtss 8630 Sep 5 12:13 /glade/scratch/erik/tests_ctsm1d63a/SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default.GC.ctsm1d63a_int/run/lnd_in
I'm going to redo this test with master and I'll see if I get the same problem. I'm also going to copy the files for the case by hand, so that it's properly updated.
OK, I setup a clean test case to demonstrate the problem.
/glade/scratch/erik/ctsm1.0.dev062_cimemaster/cime/scripts/SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default.GC.20190905_133218_5jc28
I ran it and then modified the user_nl_clm file with a difference and sent it out again. The namelist files aren't updated in the baseline directory CaseDocs. The history files are being updated.
Note, also the usernl* files aren't being updated either. This is less of a problem, but is a concern as well.
Oh, and my last test case used cime-master. so cime5.8.9-7-gd2f7157b8
I just overwrote the Clm45 lnd_in files in /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev063. So now only the last case shows the issue.
Based on your description, @ekluzek , this does seem like a problem to me (and I think I have been tripped up by this before myself).
However, I don't think this has anything to do with the -o
/ --allow-baseline-overwrite
option: my understanding is that that option lets you run a new set of tests (i.e., a new create_test invocation) with --generate
pointing to an existing baseline directory, and it will overwrite the baselines in that directory. What you're describing seems different: you have an existing test and rerun it. I do feel that namelists should be re-copied to the baseline directory in that case, but I don't think this is at all tied in with whether you specify -o
(namelists and history files should be recopied in this case regardless of whether you specify -o
).
Hi Bill. Yes, I don't mean to say that this has to do with the "-o" option itself. In my example case I didn't use "-o".
But, when I did use "-o" I assumed that history files and namelists would be updated when an existing baseline directory exists. Since, it doesn't update namelists even for a case that's just rerun -- I assume it won't update namelists when "-o" is used either (I'll check on that). What this means to me is that the "-o" option should NEVER be used -- because it won't update the baseline namelists it generates. The only safe way is to delete the baseline directory and rerun the whole thing. My suggestion is that we either get this to work correctly and update the namelists when tests are rerun -- or we at least remove the "-o" option because without it you have to delete the previous baseline when you rerun.
As I just discussed with @ekluzek , I really don't think this is related to -o
, but instead is related to the timing of when namelists are generated in the course of running a test. See also #2002 which I think is related in some ways.
The "-o" is good! I checked running a case, and then running the same case again with changes (and the -o). And it properly copies the namelist files, usernl* files as well as history files from the second case. So I now have confidence in the "-o" option again.
The disconcerting thing is if you rerun a test case, it won't update the namelist and usernl* files even if they were changed.
@ekluzek I'm thinking #4154 may not fix this because the issue here is apparently not related to the -o
flag after all, but rather arises when rerunning tests, right?
Regardless, the changes in #4154 seem very good to have – thank you @jedwards4b and @jgfouca for that!
@billsacks oh good point, #4154 is just when creating a new test.
From discussion with @jedwards4b @mvertens @briandobbins - although we see value in fixing this, we don't see it as high priority. So unless someone (@ekluzek ?) wants to take it on, we'll probably close it as a wontfix.
This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.
This issue was closed because it has been stalled for 5 days with no activity.
I'm seeing this in ctsm with branch_tags/cime5.8.3_chint17-05. But, I'm guessing it applies to all versions of cime. I thought that system tests would always update the namelists and history files when a test is rerun. But, it looks like it doesn't do that. It'll copy it the first time, but then won't overwrite after that. I'm assuming this means that copyifnewer is being used rather than a regular copy.
Because of this, the option to create_test "-o" is not only useless -- it's actually bad to use, because it won't update the contents. I'd say either remove the "-o" option, or make sure that files are copied and overwritten every time a test is rerun.