SCREAM repo should run some subset of E3SM tests to keep BFB compatibility

brhillman commented 1 year ago

Our merge of SCREAM upstream to E3SM has revealed some DIFFs in integration tests between the SCREAM and E3SM repo. We have previously run into diffs in atmosphere tests, but this most recent merge has revealed some diffs in coupled tests as well, even when F-cases are BFB. I think this suggests that, if we are going to continue to develop downstream in our SCREAM fork (which I think we do in order to remain agile), then we might think about expanding our regression tests to include some E3SM integration tests (maybe just a few to give us our best bang-for-buck test coverage). I'm thinking maybe this would mean an additional baseline compare:

Compare scream master <-> scream baselines Compare scream master <-> e3sm baselines

Help and input wanted!

ambrad commented 1 year ago

My suggestions:

Run on Chrysalis so we have E3SM baselines, if it's OK with @rljacob.
Run homme_integration plus a judicious subset of e3sm_integration.

Attn also: @jgfouca

rljacob commented 1 year ago

Would these tests run at the AT cadence (once per PR; possibly many times a day)?

ambrad commented 1 year ago

@rljacob and others, I recommend not doing this in the AT. A nightly is the appropriate place for this.

rljacob commented 1 year ago

anvil is kind of underused now so maybe run them there.

jgfouca commented 1 year ago

@brhillman , @ambrad , I am happy to set this up but we will need to clarify a few things.

Compare scream master <-> scream baselines

This should be trivial since it's basically the same thing that we are doing everywhere, just with a different test suite. We would just need some protocol of what to do when DIFFs come up. With SCREAM tests, we usually just bless and move on. In this case, a bless would imply knowingly diverging from E3SM in terms of BFBness. Does this mean we never approve a PR that causes a DIFF and so would revert the responsible PR? Or do we just make sure to understand DIFFs as they come up and therefore can be more comfortable approving non BFB scream downstream merges into E3SM?

Compare scream master <-> e3sm baselines

This is much trickier since it involves two different repos and therefore two moving targets. If e3sm gets a non-bfb PR, scream will DIFF against e3sm baselines until we do an upstream merge. Do we just expedite an upstream merge when this happens? What happens if we want a non-BFB change in SCREAM? Do we then do a downstream merge to E3SM?

whannah1 commented 1 year ago

@jgfouca Can we avoid this complication of E3SM baselines by renaming the tests with a meaningless test mod? Or be sure that the number of time steps is slightly different for the tests run from the SCREAM repo?

jgfouca commented 1 year ago

@whannah1 , what you are describing is basically approach 1 with extra steps (maintaining our own baselines for this effort). If I understood @brhillman , what we want to do is run E3SM test cases using the SCREAM repo and making sure we stay BFB (or at least notice when we are not BFB). The question is if we want to use baselines created and maintained by E3SM or maintain our own baselines for this. If we want to share baselines, we will really have to stay on top of the upstream/downstream merges.

jgfouca commented 1 year ago

@brhillman , can you look at my last two comments and chime in? I need to understand what exactly the team wants to do and we should have protocols for when things DIFF.

E3SM-Project / scream

SCREAM repo should run some subset of E3SM tests to keep BFB compatibility #2161