Open brhillman opened 1 year ago
My suggestions:
Attn also: @jgfouca
Would these tests run at the AT cadence (once per PR; possibly many times a day)?
@rljacob and others, I recommend not doing this in the AT. A nightly is the appropriate place for this.
anvil is kind of underused now so maybe run them there.
@brhillman , @ambrad , I am happy to set this up but we will need to clarify a few things.
Compare scream master <-> scream baselines
This should be trivial since it's basically the same thing that we are doing everywhere, just with a different test suite. We would just need some protocol of what to do when DIFFs come up. With SCREAM tests, we usually just bless and move on. In this case, a bless would imply knowingly diverging from E3SM in terms of BFBness. Does this mean we never approve a PR that causes a DIFF and so would revert the responsible PR? Or do we just make sure to understand DIFFs as they come up and therefore can be more comfortable approving non BFB scream downstream merges into E3SM?
Compare scream master <-> e3sm baselines
This is much trickier since it involves two different repos and therefore two moving targets. If e3sm gets a non-bfb PR, scream will DIFF against e3sm baselines until we do an upstream merge. Do we just expedite an upstream merge when this happens? What happens if we want a non-BFB change in SCREAM? Do we then do a downstream merge to E3SM?
@jgfouca Can we avoid this complication of E3SM baselines by renaming the tests with a meaningless test mod? Or be sure that the number of time steps is slightly different for the tests run from the SCREAM repo?
@whannah1 , what you are describing is basically approach 1 with extra steps (maintaining our own baselines for this effort). If I understood @brhillman , what we want to do is run E3SM test cases using the SCREAM repo and making sure we stay BFB (or at least notice when we are not BFB). The question is if we want to use baselines created and maintained by E3SM or maintain our own baselines for this. If we want to share baselines, we will really have to stay on top of the upstream/downstream merges.
@brhillman , can you look at my last two comments and chime in? I need to understand what exactly the team wants to do and we should have protocols for when things DIFF.
Our merge of SCREAM upstream to E3SM has revealed some DIFFs in integration tests between the SCREAM and E3SM repo. We have previously run into diffs in atmosphere tests, but this most recent merge has revealed some diffs in coupled tests as well, even when F-cases are BFB. I think this suggests that, if we are going to continue to develop downstream in our SCREAM fork (which I think we do in order to remain agile), then we might think about expanding our regression tests to include some E3SM integration tests (maybe just a few to give us our best bang-for-buck test coverage). I'm thinking maybe this would mean an additional baseline compare:
Compare scream master <-> scream baselines Compare scream master <-> e3sm baselines
Help and input wanted!