ESCOMP / MOSART

Model for Scale Adaptive River Transport, Mosart, part of the Community Earth System Model
http://www.cesm.ucar.edu/
Other
8 stars 27 forks source link

remove all refences of mct #74

Closed mvertens closed 8 months ago

mvertens commented 9 months ago

This PR does the following:

- **RtmMod:** 
  -  now have  two new init phases for mosart. The first init phase is now called MOSART_init1 and replaces Rtmini. This has mostly what was there before but moves the creation of all routehandles to the second init phase - MOSART_init2 which must be called after the mesh has been read in. Also - moved the section of code for MOSART_init2 to be right below the section for MOSART_init1.
  - removed the mapping for Smatp_dnstrm  since it was not used and there is no reason to create a map that is not needed. The associated code that was commented out for this has also been removed.
  - renamed RtmRun to MOSART_run
  - new indentation
-  **MOSART_physics.F90**
   - now using the computed routehandle rh_eroutUp
   - new indentation
- Removed namelist variable `do_rtmflood` and xml variable `MOSART_FLOOD_MODE`. Also removed  subroutine `MOSART_FloodInit` in `RtmMod.F90` which was never activated and in fact the model aborted if you tried to invoke it.
Verified that this was no longer needed in consult with @swensosc.
- masterproc -> mainproc
- updated the MOSART testlist for derecho and betzy (betzy is a NorESM platform) and added a PFS test

Issues resolved:
   Resolves #65
   Resolves #75
   Resolves #73
   Resolved #85

Some work on: #86 and #87

Testing:
Ran the following tests on derecho for both this PR and master (these are now aux_mosart tests with this PR)

ERP_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-qgrwlOpts ERS_Ld5.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.mosart-default ERS_Ld5.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.mosart-iceOff PEM_D.f10_f10_mg37.I1850Clm50Sp.derecho_intel.mosart-inplacethreshold PFS.f10_f10_mg37.I1850Clm50Sp.derecho_intel.mosart-default SMS_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts SMS_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-passChannelDepths SMS.f10_f10_mg37.I1850Clm50Sp.derecho_intel.mosart-nobypass SMS_Ld5.f10_f10_mg37.I2000Clm50BgcCrop.derecho_intel.mosart-mosartCold SMS_Lh3.f10_f10_mg37.I2000Clm50BgcCru.derecho_intel.mosart-mosartOff SMS_Lh3.f10_f10_mg37.I2000Clm51Sp.derecho_intel.mosart-clmAccelSpinupIgnoreWarn

All tests passed and were bfb with master EXCEPT for 
1. SMS.f10_f10_mg37.I1850Clm50Sp.derecho_intel.mosart-nobypass - this is an expected fail for both this PR and master
2. SMS_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts produced a handful of roundofflevel 

RMS lndExp_Flrr_volr 1.7951E-21 NORMALIZED 2.0078E-18 RMS lndExp_Flrr_volrmch 9.9165E-24 NORMALIZED 1.8001E-20 RMS rofImp_Flrr_volr 1.0219E-19 NORMALIZED 1.0561E-16 RMS rofImp_Flrr_volrmch 1.5927E-19 NORMALIZED 2.6501E-16 RMS rofImp_Forr_rofi 1.4041E-21 NORMALIZED 4.6582E-15 RMS rofImp_Forr_rofl 1.4094E-20 NORMALIZED 7.1437E-15

ekluzek commented 9 months ago

@mvertens thanks so much for putting these changes together! This is so helpful. I should be able to move these changes over to RTM as well.

And @jedwards4b thanks for a nice review of this, appreciate that as well. And I agree with him about removing comments. I defer to him about the PIO specific things.

ekluzek commented 9 months ago

@mvertens it sounds like you found this to be bit-for-bit right? I'm surprised because I would think answers would change going from MCT to ESMF? I suppose maybe the reason is that the sparse-matrix multiply is done the same between MCT and ESMF, just because it's a standard math problem?

jedwards4b commented 9 months ago

@erik it's bfb because the matrix multiply is all 1 or 0 and so there is no issue with respect to order of operations.

mvertens commented 9 months ago

@ekluzek - The reason that answers are bfb is that the mapping is only multiplying by 1. You are simply moving the water from one gridcell to another. Also - MCT is only used in RTM in the cpl/mct code. I'm doing another PR back to ESCOMP where I have removed this directory. So now MCT will not be in either RTM or MOSART.

slevis-lmwg commented 9 months ago

@mvertens I browsed through this PR. I would be happy to meet with you for a tour of the changes if you still need feedback from me. My slevis@ucar calendar is up to date.

mvertens commented 9 months ago

@slevis-lmwg - thanks so much. I no longer have access to your slevis@ucar calendar - since I am no longer part of UCAR. I can meet later tonight or tomorrow night my time.

slevis-lmwg commented 9 months ago

@slevis-lmwg - thanks so much. I no longer have access to your slevis@ucar calendar

@mvertens I shared my calendar with you (let me know if it doesn't work).

mvertens commented 9 months ago

@slevis-lmwg - I have carried out the testing on derecho that we talked about and it all looks good. I've summarized this in the PR.

slevis-lmwg commented 9 months ago

@mvertens thank you. I think this means that this PR is ready to merge. Please correct me if I'm wrong.

On Monday afternoon @ekluzek will show me how we merge MOSART and RTM PRs. This is first in line to get merged.

jedwards4b commented 9 months ago

@slevis-lmwg I think that it's safe to bring in #74 before the older outstanding PR's but you should consider handling the others, either merging or closing them as won't merge, prior to bringing in #76

ekluzek commented 8 months ago

The issue with #79 will be an expected fail here.

wwieder commented 8 months ago

Hello,

We're still trying to balance competing interests on CTSM SE time. In the short term (before Feb. working group meetings) this doesn't seem as critical as other PRs (e.g., Dust work, CTSM5.2 datasets, and supporting PPE neeeds for CLM6 calibration). For now, I'm suggesting that @ekluzek and @slevis-lmwg focus their energy on these more critical LMWG PRs before focusing on the MOSART PRs, @mvertens, @jedwards4b, & @briandobbins.

ekluzek commented 8 months ago

Brian and @wwieder and I met and hashed some plans for this. The plan we have now is that @slevis-lmwg and I will do this tag (and the RTM equivalent one) together next Wednesday. We'll hold off on the other tags until after LMWG, but Sam will take care of them afterwards. We were going to do the tags today, but with @wwieder direction above we decided to hold off. From what we could tell that should be acceptable to everyone, but Brian and @wwieder can negotiate something different if needed. Doing the set of tags isn't going to be that much time, but it's also distracting to our main priorities. And it seems that doing most of them after LMWG should be OK. @mvertens @jedwards4b @briandobbins

jedwards4b commented 8 months ago

@ekluzek Please prioritize merging this PR as is. Do not continue to add code and features. You have added changes to a collaborators PR without consulting or discussing with that collaborator. This is, at best, a poor practice which can only serve to alienate contributors.

wwieder commented 8 months ago

Hi everyone, This is a bigger than expected PR for just removing the MCT cap, so I'm having a bit of trouble understanding all that's going on. I do think it's worth addressing the past few commits and comments, however.

If this last point is accurate, the tone of your last comment seems counter productive, @jedwards4b. I have to believe that @ekluzek and @slevis-lmwg are working to bring this PR to main efficiently and effectively and I think it's important that our virtual conversations on this platform are constructive in tone and content.

mvertens commented 8 months ago

@wwieder - thanks for your note. I'd like to clarify a few things.

  1. The code to remove MCT was not a cleanup project. It was to replace MCT functionality with ESMF.
  2. The flood code never worked in this PR. If you asked for it the model aborted. I checked with @swensosc before removing it - and he was in agreement that this should be done.
  3. It would have been helpful if changes to my PR had come in as a PR to my fork with the changes that @ekluzek wanted to have put in. I would have welcomed that approach to incorporating modifications.
  4. I did extensive testing was done on this PR and @slevis-lmwg approved these.
jedwards4b commented 8 months ago

@wwieder I do not think that it is a common practice for anyone to alter another persons PR without explicit permission of the original author. If you don't want to ask, an alternative would be to post a new PR to the existing PR for discussion.

ekluzek commented 8 months ago

@mvertens @jedwards4b @wwieder

Reply to specific points above...

It would have been helpful if changes to my PR had come in as a PR to my fork with the changes that @ekluzek wanted to have put in. I would have welcomed that approach to incorporating modifications.

I have done PR's to PR's before. And it does make sense when I'm suggesting changes that I need the author to review or might be controversial, or it's in a repository I'm not the maintainer for example. Then it's useful, it does slow down the process though. In this case as maintainer I didn't see the point in having you approve the changes. The impression I got Sunday was that you wanted this in quickly so I did something to expedite it and then was chastised (by Jim) for doing just that on Monday. So I'm in a no win situation...

Here's a suggestion that @slevis-lmwg and I came up with yesterday...

In CTSM we make "merge PR's" that merge several PR's in together and then possibly add changes on top of that. So in the future we can do that for these cases. I mean if I for example -- brought your PR in as is -- and then added changes in the next tag -- I don't see how that's objectionable? That next tag isn't necessarily something you have to approve...

The flood code never worked in this PR. If you asked for it the model aborted. I checked with @swensosc before removing it - and he was in agreement that this should be done.

@mvertens you are right that you checked in with @swensosc and he agreed. And I appreciate that you checked in with him, he is the one that needs to approve such a thing. However, @slevis-lmwg and I as code maintainers were not informed about the change -- other than by code inspection. I was concerned and asked at the meeting, where you told me you were NOT removing flooding -- but only removing a part that wasn't functional. In our followup with @swensosc he told me he did use flooding, but in a hacky way -- so it was important to get it back to the same state for his work.

The code to remove MCT was not a cleanup project. It was to replace MCT functionality with ESMF.

The thing is that there was a TON of cleanup that came in here that is unrelated to removing MCT. @slevis-lmwg and I were neither informed nor asked permission about the inclusion of that cleanup. Nor was that cleanup provided in a separate PR, so we could assess it separately than the MCT removal.

I think I'm good with most of it -- and above you see that I documented in issues about a lot of it. But, there was a lot of this done with no discussion with us.

To list some of it: -- Arbitrary change of the test list name -- Replace PROTEC commenting directives -- The change from masterproc to mainproc -- Removal of extra implicit none statements -- Replace instances of RTM for MOSART -- Update I/O unit handling -- Fix indentation -- Remove commented out code

Again, I agree with most of it, but it's better if we were informed about it.

If you need more communication from us on what we are doing -- in order to have a good collaboration we need more communication from you.

I did extensive testing was done on this PR and @slevis-lmwg approved these.

I know you did, and we appreciate that. And I'm glad we can rely on your doing testing. Something that would help though is for you to document what testing you did (in the PR for example). I ran into problems with the baselines running the full test suite, and since you didn't mention these issues -- I'm assuming you didn't run the full test list. But, knowing what testing was done is helpful for @slevis-lmwg and I as code maintainers.

TLDR: I propose the following...

does the above list sound acceptable to everyone?

wwieder commented 8 months ago

I think @ekluzek feels this PR is ready to merge. He's proposing to do this as a merge tag like so Erik's changes are separated from the original PR.

mvertens commented 8 months ago

@wwieder - yes the changes are fine and I appreciate @ekluzek putting them in. @ekluzek - thank you for your response to my points. In looking at the code some more - I now agree that there was more cleanup than I remembered. I also agree in general with your proposed suggestions.

slevis-lmwg commented 8 months ago

I wanted to publicly acknowledge that each one of us brings different strengths to the group. I apologize if this is the wrong platform for this comment. I will remain brief, and I hope I don't get in trouble for it :-)

jedwards4b commented 8 months ago

@ekluzek I see a software engineer who is overwhelmed with tasks and what I thought I was doing was to try to help reduce the burden while recognizing and supporting the work of a very important outside collaborator. I hope that we are working toward a solution where you allow others to take on some of these tasks - they may not always be done exactly the way that you would do them, but I think that we all care about developing quality software that can stand the test of time. I apologize for my blunt comments sometimes, perhaps I could have been more effective had I not exposed my personal frustrations.

With respect to the suggestions above:

Future tags will be done with a merge tag that includes several PR's and changes on top of that

I'm not sure if I understand the necessity for this, it seems like it puts an unnecessary burden on you and or @slevis-lmwg - why not allow developers to test their own PR's (with a suitable test suite) and merge and create a tag for each one?

ekluzek commented 8 months ago

@jedwards4b thanks so much for your last comment that calms the rhetoric down. I do appreciate the help. And I accept your apology it helps set a much better tone to work together in. As humans we do need to express our frustrations, but when working together we need to have positive ways of doing that. I know Brian Dobbins will be working on that within CSEG.

And actually I wasn't seeing that anyone of the CTSM SE group was going to be able to do this piece that @mvertens did before CESM3. Not only that, but it would've taken any of us a ton more time than she was able to do it in. I would've needed to consult with the both of you to take it on, and it would've still taken me a lot longer even if I could work on it now.

I also see the collaboration with NiO and other external groups to be very important for CESM. CESM should lean heavily into the "C" in the name, as a way to distinguish from E3SM and also to help prevent a split like that from happening. My hope is that we can bring in their work and take advantage of their advances as well as not having to have either of us have much difference in our repositories between us. So I'm glad to have NiO machines in the test list and hope that betzy will be added to ccs_config for example. As well as other additions that will bring in advancements for everyone.

I also agree with @slevis-lmwg comments about @mvertens as well. She is able to do stuff like this amazingly quickly which is awesome and her superpower. And @slevis-lmwg I tend to follow the maxim "Public praise, private reproof", it's almost always good to give praise in a public forum, so I think you are right to do that in a forum like this in my view.

On this question we will be discussing this with CTSM SE's this morning and get back to you...

Future tags will be done with a merge tag that includes several PR's and changes on top of that

I'm not sure if I understand the necessity for this, it seems like it puts an unnecessary burden on you and or @slevis-lmwg - why not allow developers to test their own PR's (with a suitable test suite) and merge and create a tag for each one?

ekluzek commented 8 months ago

@mvertens @jedwards4b and all. We discussed this in the CTSM SE group this morning. We have some proposals based on the discussion. And I'll take some time to also add some clarification on roles and responsibilities here. And by the way it's both OK and actually good to fight about these proposals. I'd like to have the standards set by the group now, to help avoid future friction. If we agree on group standards we can more easily hold each other accountable.

Roles:

My thought is that we have a few tags that @slevis-lmwg and I need to finish off. But, once we get to @mvertens refactoring tag is the time that @mvertens could make the tag if she would like to take on that responsibility. @mvertens Jim volunteered you, but would you like the responsibility to make MOSART tags? We haven't made tags in MOSART for about 2 years, so obviously we aren't doing a ton of work here. I imagine @mvertens will be doing work at that point that we just need to know about, but could let her do tags for. The other development work is @swensosc work on the GW branch -- but as it's on a branch it's disconnected from the main-development.

A sticky thing is being informed about timelines. Unfortunately we can't promise dates, but we can be responsible to inform about plans and expected timelines. And we could be expected to give updates to PR authors if those plans change. This is something that we could negotiate on to improve it for everyone, and we are open to ideas on what that would look like. And we are open to additions or requests for subtractions of some things above if someone thinks it's important.

If the above sounds good we'll make some changes in github to implement the rules. And I'll find a place to make them visible within MOSART github pages...

mvertens commented 8 months ago

@ekluzek - thanks for putting this list together. These points all seem very reasonable to me. It would be helpful if this could be moved to a discussion rather than be associated with this PR. In terms of my tagging - I think you are referring only to my major refactorization of mosart - right? Could you please clarify that.

ekluzek commented 8 months ago

Good idea @mvertens! I like that! I'll open up discussions and that's a great place for this to live longterm. I'm glad you find the list reasonable. Again happy to negotiate on specifics...

Yes, in terms of tagging. I'm wondering if you'd like to be able to do your own tag/baselines for the refactoring PR? And you'll obviously have PR's after that point as well. So we could have you do your own tags afterwards as well. As I say above we would just need to let us know about it, approve it, and then give you the green light to make it.

jedwards4b commented 8 months ago

My understanding is that we need this PR to be merged before we can move forward on #76 - perhaps I misunderstood but I thought that this would be done by now. Please clarify the timeline for merging this and the other outstanding mosart PR's.

ekluzek commented 8 months ago

@jedwards4b thanks for the question. I was doing some verification with three tests that were showing changes to answers.

ERP_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-qgrwlOpts PEM_D.f10_f10_mg37.I1850Clm50Sp.derecho_intel.mosart-inplacethreshold SMS_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts

The first two had issues with history files in the baselines not output at the same frequency as the new updated tag. I reran them and was able to show that they are fine.

The last one is an issue that @mvertens discusses in the introduction to this PR (and thank for including that discussion!). She points out they are roundoff level changes -- and I verified that as well even for a month long simulation (with only a few CTSM fields diverging by greater than roundoff after the month). It's not clear why using decomp_option="1d" shows a roundoff level change -- but I don't think we care enough to track it down. The code changes are large enough that it's not clear what the cause might be.

I also did another test with the last decomp_option "basin" to verify it was OK, and it shows identical results to the baseline. So it is perfect. The other tests are roundrobin and they show identical results so they of course are fine as well.

grep RMS /glade/derecho/scratch/erik/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist/run/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist.cpl.hi.0001-02-01-00000.nc.cprnc.out
 RMS lndExp_Flrr_volr                 4.9583E-19            NORMALIZED  1.7272E-16
 RMS lndExp_Flrr_volrmch              6.8211E-19            NORMALIZED  2.7513E-16
 RMS rofImp_Flrr_volr                 1.9708E-18            NORMALIZED  6.4934E-16
 RMS rofImp_Flrr_volrmch              2.0494E-18            NORMALIZED  7.8161E-16
 RMS rofImp_Forr_rofi                 9.3811E-20            NORMALIZED  6.3271E-14
 RMS rofImp_Forr_rofl                 3.4147E-21            NORMALIZED  1.2038E-15
derecho5 cases/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist> grep RMS /glade/derecho/scratch/erik/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist/run/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist.mosart.h0.0001-01.nc.cprnc.out
 RMS areatotal2                       3.9280E-06            NORMALIZED  3.5060E-16
 RMS DIRECT_DISCHARGE_TO_OCEAN_ICE    3.8158E-15            NORMALIZED  1.1627E-14
 RMS DIRECT_DISCHARGE_TO_OCEAN_LIQ    1.4055E-15            NORMALIZED  5.5442E-16
 RMS RIVER_DISCHARGE_OVER_LAND_LIQ    2.1871E-13            NORMALIZED  8.7639E-16
 RMS TOTAL_DISCHARGE_TO_OCEAN_ICE     3.8158E-15            NORMALIZED  1.1627E-14
 RMS TOTAL_DISCHARGE_TO_OCEAN_LIQ     5.8561E-15            NORMALIZED  1.0563E-15
derecho5 cases/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist> grep RMS /glade/derecho/scratch/erik/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist/run/SMS_Lm1_D.f10_f10_mg37.I1850Clm50Bgc.derecho_intel.mosart-decompOpts.GC.mosart1__49_ctsm51d161delist.clm2.h0.0001-01.nc.cprnc.out
 RMS CH4_SURF_AERE_SAT                1.1154E-12            NORMALIZED  3.0834E-05
 RMS CH4_SURF_DIFF_SAT                9.5995E-19            NORMALIZED  5.3827E-09
 RMS FCH4                             2.9203E-17            NORMALIZED  1.4428E-06
 RMS FCH4TOCO2                        2.9183E-14            NORMALIZED  9.2769E-07
 RMS NEM                              2.9224E-14            NORMALIZED  2.1793E-06
 RMS CONC_O2_SAT                      1.0804E-05            NORMALIZED  1.5833E-04

So @slevis-lmwg and I will schedule to complete this tag as soon as we can.

I've also started the discussions, and have a few discussions under

https://github.com/ESCOMP/MOSART/discussions

Please comment and suggest changes as needed. I plan to have it open for awhile to get feedback, and will update in response. Once, we've settled on it -- I'll lock the discussion.

wwieder commented 8 months ago

Thanks for this update, @ekluzek .