E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
347 stars 354 forks source link

performance impact of ocean_tight_coupling #655

Closed worleyph closed 8 years ago

worleyph commented 8 years ago

I just discovered the following emprically (and then searched for and found this explanation):

 It is possible to reduce the ocean lag in the system.  There is a namelist
 variable, <ulink url="../../cesm/doc/modelnl/nl_drv.html">ocean_tight_coupling</ulink>,
 that moves the step where ocean data is rearranged from the ocean pes to
 the coupler pes from the end of the loop to before the atmosphere/ocean
 flux computation.  If ocean_tight_coupling is set to true, then the ocean
 lag is reduced by one atmosphere coupling period, but the ability of the
 ocean model to run concurrently with the atmosphere model is also reduced
 or eliminated.  This flag is most useful
 when the ocean coupling frequency matches the other components.

Notice in particular "the ability of the ocean model to run concurrently with the atmosphere model is also reduced or eliminated." This seems to be the current default, and if it represents what we will be using for production, then this will impact performance significantly (as much as doubling runtime). I need this clarified, so that I know what to suggest that we do (and perhaps start looking to see whether this is in fact necessary).

rljacob commented 8 years ago

Doug had suggested this so assigned to him.

douglasjacobsen commented 8 years ago

It seemed like the most scientifically correct coupling mode for us to be using. Though, if performance is doubled we could always change it and then later explore the impact of the different coupling modes on the climate.

worleyph commented 8 years ago

Understanding the performance impact is then important - I've seen performance much higher than even running just atmosphere and ocean sequentially, and I don't know why yet. Perhaps that is fixable, but I need to understand the coupling logic and implementation. @rljacob , is this something that you are familiar with?

Also, @douglasjacobsen , can I just change the coupling type (and to what) to verify that the old style continues to work as expected?

mt5555 commented 8 years ago

yikes! it's good Pat found this. All our performance estimates are based on the assumption we can run ocean concurrently. I hope since it was good enough for POP, it will be good enough for MPAS :-)

rljacob commented 8 years ago

@apcraig knows the most about these options and how they're implemented.

douglasjacobsen commented 8 years ago

@worleyph yes you can.

I have flow charts that describe the different coupling options that could help if you want.

worleyph commented 8 years ago

@douglasjacobsen - thanks. Right now I just want to do a sanity check. What is the old default (CESM1_ORIG?), or what should I use instead of CESM1_MOD_TIGHT (CESM1_MOD?).

Please also send me a pointer to the flow charts, or send them to me directly, whatever is most convenient for you.

apcraig commented 8 years ago

Last year, several new sequencing options were added to the driver, these are controlled by the CPL_SEQ_MOD option is you are using a version of cime that includes those changes. You can significantly reduce the lags with little performance penalty by using RASM_OPTION1. RASM_OPTION2 would be the next step up which sequences the ice and ocean. RASM_OPTION2 should only be considered if the ocean is coupled at the atm/ice coupling frequencing. The _TIGHT options were introduced a long time ago, mostly for coupling the ocean with a data atmosphere. They will probably not help too much with science in fully active systems. If you feel you need to run the atm and ocean sequentially for science reasons, then we should add a new option off the RASM_OPTION1 mode. It sounds like you have all the flow charts, but if you have questions, let me know.

apcraig commented 8 years ago

env_run.xml should describe the options generally. I include that documentation below. CESM1_ORIG was the old default up to CESM1.1. CESM1_MOD is the current default in CESM2.

<desc>Coupler sequencing option. This is used to set the driver namelist variable cpl_seq_option.
  CESM1_ORIG is the cesm1.1 implementation.  
  CESM1_MOD includes a cesm1.3 mod that swaps ocean merging and atm/ocn flux 
  computation.  
  RASM_OPTION1 runs prep ocean before the ocean coupling reducing 
  most of the lags and field inconsistency but still allowing the ocean to run 
  concurrently with the ice and atmosphere.  
  RASM_OPTION2 is similar to RASM_OPTION1 
  but sequences the ice model, prep ocean and ocean model in that order.  The 
  ocean model loses some of the concurrency with the ice model.  
  CESM1_ORIG_TIGHT and CESM1_MOD_TIGHT are consistent with the old variables 
  ocean_tight_coupling = true in the driver.  That namelist is gone and the 
  cpl_seq_option flags take it's place.
  TIGHT coupling makes no sense with the OPTION5 and OPTION6 flags.</desc> 
philipwjones commented 8 years ago

@mt5555 : Actually, there was a desire to do this with POP too - just inhibited by some algorithmic choices so a diurnal cycle had to be kludged. It's good to have more frequent coupling for the diurnal cycle and mixed layer. And at high resolution, it would be great to have more frequent coupling for inertial interactions between storms and eddies (though even there, probably don't need that frequent a coupling). As in most things, compromises are possible, so I agree with @douglasjacobsen that we probably just need to evaluate the performance implications and decide what we can/can't do.

worleyph commented 8 years ago

Thanks @apcraig . Sounds like my next experiment should be RASM1. Decision is still up to others, but I'll try to generate some performance data quickly.

apcraig commented 8 years ago

The science behind this work was led by Andrew Roberts at NPS. Some of the motivation is described in this paper, http://www.igsoc.org/annals/56/69/a69a760.pdf As Phil says, there are all kinds of tradeoffs, but for RASM, this comes down to resolving the inertial interactions in their high resolution model and to do that, they use RASM_OPTION2 and run the atm/ocean/ice at the same coupling period. Having said that, as I suggested above, RASM_OPTION1 fixes a lot of problems that were introduced decades ago when the system was purely concurrent and multiple executable. At that time efforts were made to overlap as much work as possible, running "coarse" resolutions, with relatively long coupling periods (hours to days) where lags had little impact on the solutions. The latest generation of the driver/coupler allows for much greater flexibility in sequencing and the science does better when the lags are reduced with RASM_OPTION1. If you turn on RASM_OPTION1, make sure you look at the solutions and budgets carefully. There has only been limited testing with this option in global models as far as I know.

mt5555 commented 8 years ago

@apcraig : you mention the _TIGHT options were put in for data atmosphere. Has there been any testing of _TIGHT in global coupled models? Or more to the point, what is the lowest risk place to start with the initial experiments?

douglasjacobsen commented 8 years ago

Lowest risk would likely be the CESM1_MOD case, as that's been the most tested.

apcraig commented 8 years ago

Agreed, CESM1_MOD is the lowest risk. I have not run the _TIGHT options in fully active systems, but it should work. Again, I'd start with CESM1_MOD or then explore RASM_OPTION1 to start with.

apcraig commented 8 years ago

And just to reiterate, there are two issues here. What is required for science (ie. coupling stability, bl interaction, inertial periods, etc) and then what are the performance tradeoffs. CESM1_ORIG, CESM1_MOD, and RASM_OPTION1 all perform basically the same, support the same atm/ice/ocean concurrency, but change some of the sequencing of the coupler computations to reduce lags as you switch between those options. For those options, the ice and atm run sequentially, and the ocean can run concurrently with ice+atm. RASM_OPTION2 imposes a constraint on the coupling such that the ocean will run sequentially with the ice model but can still run concurrently with the atm model. Again, it only makes sense to do this if the ice/atm/ocean are coupling at the same period. The _TIGHT options impose a constraint on the ocean to run sequentially with the atm but concurrently with the ice. Again, makes most sense if atm/ocean/ice are on the same coupling frequency and was introduced mostly for datm/ocean configurations.

In the standard RASM configuration, the atm is very expensive, so the ice and ocean models can keep up (more or less) even in RASM_OPTION2. In that case, we run the atm and ocean concurrently and then the ice model runs on basically all the pes of the atm+ocean first. The time is limited by ice+atm performance and the atm/ocean/ice are all running at the same coupling frequency. This configuration is regional, focused on the Arctic, and has 50km atm and 9km ocean/ice resolution. Because of the relative costs of the models, RASM_OPTION2 is not a huge performance hit compared to running RASM_OPTION1 in this configuration. The pe layout is different, but the throughput is similar. But in some ways, that just because the atm is so expensive in this configuration.

worleyph commented 8 years ago

Perhaps this should go on the PR that @douglasjacobsen submitted to change the default back to CESM1_MOD, but the default coupling frequency for ocean in the A_B2000ATMMOD compset and the ne30_oEC grid seems very high - 24 times a day? (Largest I've seen in the past is 4 times a day.)

(in env_runx.ml):

 <entry id="NCPL_BASE_PERIOD"   value="day"  />
 <entry id="ATM_NCPL"   value="48"  />
 <entry id="OCN_NCPL"   value="24"  />

(and in drv_in):

  atm_cpl_dt = 1800
  ocn_cpl_dt = 3600

Is this really what we want? (This also has a performance impact, for what it is worth.)

douglasjacobsen commented 8 years ago

@worleyph I'm pretty sure for the water cycle simulations we'll either want 3600 or 7200 (1hr or 2hr) for the ocean coupling intervals.

@maltrud might have other ideas though.

I think the v0 runs made it seem like a coupling interval larger than that (i.e. 6 hours like you suggested) is unstable for the ocean (though maybe I have it backwards).

mt5555 commented 8 years ago

All the v0 runs have used 6h coupling.

A safer choice to start with might be something closer to 6h than 1h.