ESCOMP / CAM-SIMA

Community Atmosphere Model - System for Integrated Modeling of the Atmosphere
3 stars 11 forks source link

Check-list for making CAMDEN public #204

Open nusbaume opened 1 year ago

nusbaume commented 1 year ago

This issue was created to help list out what all should be done before and after CAMDEN has been made public. I view this repo as a discussion, so feel free to add any concerns or suggestions you might have related to this topic.

My current list so far:

Before being made public:

  1. CAMDEN should be able to run at least one out-of-box compset (e.g. FKESSLER), which requires:
  1. CAMDEN should have its first tag, which requires:
  1. Decide new public location for code:

After being made public:

Any other things we should make sure of before or immediately after going public?

@cacraigucar @peverwhee @PeterHjortLauritzen @briandobbins

peverwhee commented 1 year ago

@nusbaume one other consideration that has come up a couple times in recent days:

Deciding what to call the repo (CAM? CAMDEN? SIMA?)

cacraigucar commented 1 year ago

Along with @peverwhee's comment about the repo name, I believe the repo location (NCAR or ESCOMP) needs to be decided before it is live. Both of these decisions should be made prior to being made public

I also tried to figure out if we can add the "CAM Development" project to issues/PRs and it wasn't immediately apparent if we can do that or not. So this probably needs to be a checklist item (and maybe even before we go live)

Documentation needs to be added to the list (especially what is the difference between ESCOMP/CAM and this new repo) Website mods may also need to be made

briandobbins commented 1 year ago

I think we should have a conversation about this, and pull in Jim & Bill as well. I think there are a good number of concerns, including the very reasonable (and historical) belief that if there are two repos -CAM, and CAMDEN- ... we'll end up supporting both. We can't just end CAM, and we can't replace it either, so the only real solution seems to find a way to merge the developments in CAMDEN back into CAM. But I think you all have a better sense of the challenges there.

The one thing I can say is that the repo should be in ESCOMP. It's not an NCAR model, it's a community one.

I'll look for a slot for everyone to meet. I know this will be a challenge, and will likely take more than one meeting, but let's start, have some open discussion, then think about things a bit, and maybe have another then.

gold2718 commented 1 year ago

he only real solution seems to find a way to merge the developments in CAMDEN back into CAM.

I do not see any way to even attempt this. Everything about how CAMDEN is built and configured is fundamentally different from CAM. This was done as the only way I could see to get out from under CAM's huge technical debt. Even disregarding the differences in how the models run, they no longer resemble each other.

I think they only way out is to create and execute a plan to include all required functionality from CAM into CAMDEN and push it to CAM as a CAM tag so that it replaces CAM. The modular nature of CAMDEN should enable the community to step up and implement anything that NCAR chooses to leave out of the new version of CAM.

jedwards4b commented 1 year ago

But we can do this by putting camden in a branch of the cam repository - when camden is ready we make that branch the new development branch.

gold2718 commented 1 year ago

But we can do this by putting camden in a branch of the cam repository - when camden is ready we make that branch the new development branch.

Sure but that has nothing to do with merging (which @briandobbins mentioned).

briandobbins commented 1 year ago

I feel there are two key issues - the first is the 'two repos' problem, and that can be resolved by moving (perhaps 'merging' was imprecise) the CAMDEN code into a branch of CAM.

The second, however, is my growing concern about increased divergence between these versions of CAM, and the length of time before any reconciliation seems possible. To use some very rough numbers from a discussion that Peter, Jesse and I had, it seems like we're looking at ~8.5 person-years of effort before we have it 'ready'.

Even with 4 people working on these full-time, that's over 2 more years of effort & divergence. And I fear that a major change all at once is also problematic for the external development community. Getting things public helps a little bit, but it'd be much better if we can find portions of the changes that can be merged sooner vs waiting for the 'one merge to rule them all' to happen.

Maybe that's truly not possible, but I'm concerned about a total overhaul all at once, and at least two years out, it seems.

jedwards4b commented 1 year ago

Would it be possible to add any new developments to both cam and camden at the same time to avoid further divergence? If camden isn't far enough along to do this what effort would it take to get to that point?

cacraigucar commented 1 year ago

The development process is that as we convert physics package to CCPP, we move the CCPP'ized code into the github NCAR/atmospheric_physics repo. This would include both the metadata and the fortran code. Then CAM would be directed to get it's source code for this package from the NCAR/atmospheric_physics repo and that physics package would be coming in via the externals. CAMDEN also would be accessing this same code. Steve had me prototype this approach with both Kessler and Held-Suarez. Both of those packages now are only being accessed from NCAR/atmospheric_physics. We've yet to explore this with a physics package where the interface code is complicated.

To answer why this particular repo is under "NCAR" and not "ESCOMP", there were extensive discussions early on with NOAA about a single repo for CCPP physics packages and the conclusion was that global CCPP should have multiple repos available. The "NCAR/atmospheric_physics" was set up as a location where all of NCAR would put their ccpp'ized code. Dave Gill was told about this repo, but he had not utilized it before he left NCAR.

briandobbins commented 1 year ago

I'll add that the last I spoke with Michael Duda about this, my impression was that they aren't implementing CCPP in MPAS-A, but will have 'compatible' physics in an independent repo. This is somewhat independent of CAMDEN / CAM issues, but likely needs more work to understand how that'll happen / work, and what the logistics are in terms of physics repos.

nusbaume commented 1 year ago

Thanks all for the feedback so far! Below is my response to some of the questions that have been raised:

I also tried to figure out if we can add the "CAM Development" project

@cacraigucar I think this is because CAMDEN is a private repo, and so the CAM project page can’t “see” the CAMDEN issues and PRs. However, I agree that once CAMDEN is made public (in whatever form) then we should add the CAMDEN-related issues to the project page, and likely create a new “view” that is CAMDEN specific. Thus I have added it as a new checklist item for the “after public" section. I also added a new checklist item to remind us that we’ll need to move over all of the open CAMDEN issues as well.

The one thing I can say is that the repo should be in ESCOMP

I think I agree with @briandobbins at this point, so I have removed the “which org” question above and instead replaced it with basically a “move to somewhere in ESCOMP" task.

Would it be possible to add any new developments to both cam and camden

Beyond what @cacraigucar pointed out for the physics, the chemistry code should also start to become unified as MICM (the model independent chemistry module ACOM is working on) comes online, at which point both CAM and CAMDEN should be using the same chemistry code base. Also the MPAS dycore is an external (so should be shared between both models). The same thing would theoretically also be true if we moved the ionosphere routines to be their own external/model component.

So in the end we should be left with basically the SE and FV3 dycores, and the model infrastructure. As @gold2718 pointed out the model infrastructure is basically incompatible, but the dycores do share a fair amount of code. Originally the plan was to just have the dycore updates be one-way (CAM -> CAMDEN), but we could see about making it a two-way transfer, although then again that will likely be somewhat limited due to the differences in the infrastructure.

But we can do this by putting camden in a branch of the cam repository

The one advantage of making a branch I can see is that if we do allow for a “two-way” update of the dycore code then this would make that update simpler (we would just have to cherry-pick between the branches each time). However, the disadvantage is that currently CAM and CAMDEN don’t share any actual git history, as CAMDEN was created from scratch. I of course can just brute-force it into a CAM branch, but it might require the need to keep NCAR/CAMDEN around for a while so that we have the relevant git history. Also having two repos can provide some flexibility if we want certain aspects of the repo to be different between CAM and CAMDEN (for example, CAMDEN has significantly more Github Action workflows then CAM). That’s probably not much of a deal breaker though, so I could likely be convinced either way.

Finally, I have added @billsacks to the repo in case he wants to contribute any thoughts to this thread.

nusbaume commented 1 year ago

@peverwhee Also that is a good point about the names, so I also added a checklist task to decide on the repo or branch name as well.

billsacks commented 1 year ago

I of course can just brute-force it into a CAM branch, but it might require the need to keep NCAR/CAMDEN around for a while so that we have the relevant git history. Also having two repos can provide some flexibility if we want certain aspects of the repo to be different between CAM and CAMDEN (for example, CAMDEN has significantly more Github Action workflows then CAM). That’s probably not much of a deal breaker though, so I could likely be convinced either way.

I think you can move an entire branch - with history - into a different repo even if it doesn't share any history with any existing branch in that repo. It feels weird to have a branch that doesn't share any history in the same repo, but I see why that could have some practical advantages – but I also see why it could have some practical disadvantages... so I don't really have feelings one way or another on that point... the main point of this comment is to note that (I think) this wouldn't be a history-losing move.

gold2718 commented 1 year ago

One thing to think about is that if you move CAMDEN to CAM, what happens to all its issues and PRs? You will have both projects sharing issues and may have to back-label CAM issues to make it easy to keep them separate. Moving code between CAM and CAMDEN can be done in any clone. I do not see any particular technical advantage to moving CAMDEN to the CAM repo.