ESMCI / cime

Common Infrastructure for Modeling the Earth
http://esmci.github.io/cime
Other
161 stars 206 forks source link

System tests don't seem to recopy namelists to baselines (when a test is rerun with namelist changes) #3229

Closed ekluzek closed 1 year ago

ekluzek commented 5 years ago

I'm seeing this in ctsm with branch_tags/cime5.8.3_chint17-05. But, I'm guessing it applies to all versions of cime. I thought that system tests would always update the namelists and history files when a test is rerun. But, it looks like it doesn't do that. It'll copy it the first time, but then won't overwrite after that. I'm assuming this means that copyifnewer is being used rather than a regular copy.

Because of this, the option to create_test "-o" is not only useless -- it's actually bad to use, because it won't update the contents. I'd say either remove the "-o" option, or make sure that files are copied and overwritten every time a test is rerun.

jedwards4b commented 5 years ago

@ekluzek please provide an example

jgfouca commented 5 years ago

This must be for cesm only, because we use this workflow (-o) all the time in E3SM.

ekluzek commented 5 years ago

I used to use "-o" all the time, but no longer due because of concern over issues like this.

The example I have now is that I updated the clm4_5 namelist after the original test run of /glade/work/erik/ctsm_firefix/. After rerunning it didn't recopy the lnd_in file over...

So for example this test SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default after being rerun didn't update the lnd_in file in the generated. It DOES look like it updated the history files so that's good.

-rw-rw-r-- 1 erik cgdtss 8630 Sep 4 13:43 /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev063/SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default/CaseDocs/lnd_in -rw-r--r-- 1 erik cgdtss 8630 Sep 5 12:13 /glade/scratch/erik/tests_ctsm1d63a/SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default.GC.ctsm1d63a_int/run/lnd_in

I'm going to redo this test with master and I'll see if I get the same problem. I'm also going to copy the files for the case by hand, so that it's properly updated.

ekluzek commented 5 years ago

OK, I setup a clean test case to demonstrate the problem.

/glade/scratch/erik/ctsm1.0.dev062_cimemaster/cime/scripts/SMS_D_Ld1.f09_g17.I1850Clm45BgcCruGs.cheyenne_intel.clm-default.GC.20190905_133218_5jc28

I ran it and then modified the user_nl_clm file with a difference and sent it out again. The namelist files aren't updated in the baseline directory CaseDocs. The history files are being updated.

Note, also the usernl* files aren't being updated either. This is less of a problem, but is a concern as well.

ekluzek commented 5 years ago

Oh, and my last test case used cime-master. so cime5.8.9-7-gd2f7157b8

ekluzek commented 5 years ago

I just overwrote the Clm45 lnd_in files in /glade/p/cgd/tss/ctsm_baselines/ctsm1.0.dev063. So now only the last case shows the issue.

billsacks commented 5 years ago

Based on your description, @ekluzek , this does seem like a problem to me (and I think I have been tripped up by this before myself).

However, I don't think this has anything to do with the -o / --allow-baseline-overwrite option: my understanding is that that option lets you run a new set of tests (i.e., a new create_test invocation) with --generate pointing to an existing baseline directory, and it will overwrite the baselines in that directory. What you're describing seems different: you have an existing test and rerun it. I do feel that namelists should be re-copied to the baseline directory in that case, but I don't think this is at all tied in with whether you specify -o (namelists and history files should be recopied in this case regardless of whether you specify -o).

ekluzek commented 5 years ago

Hi Bill. Yes, I don't mean to say that this has to do with the "-o" option itself. In my example case I didn't use "-o".

But, when I did use "-o" I assumed that history files and namelists would be updated when an existing baseline directory exists. Since, it doesn't update namelists even for a case that's just rerun -- I assume it won't update namelists when "-o" is used either (I'll check on that). What this means to me is that the "-o" option should NEVER be used -- because it won't update the baseline namelists it generates. The only safe way is to delete the baseline directory and rerun the whole thing. My suggestion is that we either get this to work correctly and update the namelists when tests are rerun -- or we at least remove the "-o" option because without it you have to delete the previous baseline when you rerun.

billsacks commented 5 years ago

As I just discussed with @ekluzek , I really don't think this is related to -o, but instead is related to the timing of when namelists are generated in the course of running a test. See also #2002 which I think is related in some ways.

ekluzek commented 5 years ago

The "-o" is good! I checked running a case, and then running the same case again with changes (and the -o). And it properly copies the namelist files, usernl* files as well as history files from the second case. So I now have confidence in the "-o" option again.

The disconcerting thing is if you rerun a test case, it won't update the namelist and usernl* files even if they were changed.

ekluzek commented 2 years ago

4154 should fix this issue.

billsacks commented 2 years ago

@ekluzek I'm thinking #4154 may not fix this because the issue here is apparently not related to the -o flag after all, but rather arises when rerunning tests, right?

Regardless, the changes in #4154 seem very good to have – thank you @jedwards4b and @jgfouca for that!

ekluzek commented 2 years ago

@billsacks oh good point, #4154 is just when creating a new test.

billsacks commented 2 years ago

From discussion with @jedwards4b @mvertens @briandobbins - although we see value in fixing this, we don't see it as high priority. So unless someone (@ekluzek ?) wants to take it on, we'll probably close it as a wontfix.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 90 days with no activity. Remove stale label or comment or this will be closed in 5 days.

github-actions[bot] commented 1 year ago

This issue was closed because it has been stalled for 5 days with no activity.