NGEET / fates

repository for the Functionally Assembled Terrestrial Ecosystem Simulator (FATES)
Other
100 stars 92 forks source link

ed-clm fails bit for bit restart for f19 and f09 grids #14

Closed bandre-ucar closed 8 years ago

bandre-ucar commented 8 years ago

Notes from Stef Muszala as of 2016-02-10

Current state of restarts – notes that I compiled while working on this.

From what I've been able to track down, the issue that I'm seeing in f19 and f09 restart errors have to with the code in ed_clm_link. First to recreate the problem, use the branch: https://svn-ccsm-models.cgd.ucar.edu/clm2/branches/ed4x5fix/

I've been using this to debug and therefore haven't made a branch_tag to go with my latest fix attempts. A bulk of the mods are simply there to help in debugging (when compared to https://svn-ccsm-models.cgd.ucar.edu/clm2/branch_tags/ed4x5fix_tags/ed4x5fix_n10_r120

You can recreate the problem by running an f19 case for 5 days and comparing that with a case that attempts to restart from the 4th day of the above run and running for one more day. You should see differences at time-step 145.

See: /glade/u/home/muszala/NCAR_rfisher_NGEE/svn/edHighRes/cime/scripts/chkf19

and

/glade/scratch/muszala/chkf19/run

4+1 is the restart run I've been testing...baseline is in the 'save' directory.

When ED is restarted this roughly happens:

call EDRest
call ed_clm_link

Start CLM for current timestep
call surfRad with a use_ed branch ( where one can see the first differences in surface radiation in elai_profile around line 528 )
call EDPhotoSynthesis
call EDAccumulateFlux
end CLM for current timestep

start ED for current timestep
call EDPhysiology which calls EDCohortDynamics
call ed_clm_link (Note 1)
end ED for current timestep

Note 1: now see problems in currentCohort%bstore and ED_bstore and then elai_profile on ranks 10, 35,36,37,68 and 69 for one and/or two cohort(s) depending on the MPI rank. At this point everything should be more in-synch with the rest of the run and we also see that some of the data that was set up by the call to ed_clm_link in EDRest have been overwritten during this time-step, indicating that there is unnecessary code being called.

By the end of time-step 145, there are differences at the end of the time-step and by the end of the run we can see errors on the order of :

>>cprnc chkf19.clm2.h0.0001-01-05-00000.nc ../save/chkf19.clm2.h0.0001-01-05-00000.nc | grep RMS
RMS ED_bstore                        3.3816E-12            NORMALIZED  2.7871E-07
RMS GPP                              1.7497E-09            NORMALIZED  9.3335E-05
RMS H2OSOI                           2.5714E-08            NORMALIZED  2.4811E-07
RMS NPP                              1.2314E-09            NORMALIZED  1.3515E-04
RMS PFTbiomass                       6.1993E-13            NORMALIZED  2.1156E-07

in the clm2.r.0001-01-05 file (ed fields only)

>>cprnc -m chkf19.clm2.r.0001-01-05-00000.nc ../save/chkf19.clm2.r.0001-01-05-00000.nc | grep RMS | grep ed_
RMS ed_bstore                        1.8381E-10            NORMALIZED  8.3040E-06
RMS ed_gpp_acc                       9.8746E-13            NORMALIZED  1.5586E-05
RMS ed_npp_acc                       5.0423E-13            NORMALIZED  1.6529E-05
RMS ed_resp_clm                      7.5660E-14            NORMALIZED  1.0302E-04
RMS ed_cwd_ag                        4.6912E-20            NORMALIZED  2.0821E-11
RMS ed_cwd_bg                        3.1274E-20            NORMALIZED  2.0821E-11
RMS ed_leaf_litter                   8.3800E-17            NORMALIZED  1.2463E-08
RMS ed_root_litter                   5.2713E-15            NORMALIZED  2.4462E-07
RMS ed_f_sun                         2.3372E-10            NORMALIZED  9.7288E-08
RMS ed_fabd_sun_z                    5.1857E-13            NORMALIZED  5.9807E-09
RMS ed_fabi_sun_z                    2.4104E-12            NORMALIZED  8.0762E-08
RMS ed_fabd_sha_z                    2.5355E-09            NORMALIZED  3.1893E-04
RMS ed_fabi_sha_z                    1.2666E-08            NORMALIZED  1.9667E-04
RMS ed_water_memory                  4.3388E-09            NORMALIZED  1.0575E-06
RMS ed_old_stock                     1.8204E-07            NORMALIZED  3.2317E-06

This is one call that can be separated out for example:

760       if (.not. calledFromEdRest) then
761
762          call this%ed_update_history_variables( bounds, ed_allsites_inst(begg:endg), &
763             firstsoilpatch, ed_Phenology_inst, canopystate_inst)
764
765       endif

in my own test start with:

1+1 == 2 ?  BFB ... clm2.r. file is differs in ED_GDD0
2+1 == 3 ?  BFB ... clm2.r. file is differs in ED_GDD0
1+2 == 3 ?  BFB ... clm2.r. file is differs in ED_GDD0
3+1 == 4 ?  BFB ... clm2.r. file is differs in ED_GDD0

4+2 == 6 ? !BFB ts5 and ts6
5+1 == 6 ? !BFB ts6 is incorrect
3+3 == 6 ? BFB ts 4, 5 & 6 all check out
4+1 == 5 ? !BFB ts5 incorrect in clm2.r. and clm2.h0
bandre-ucar commented 8 years ago

Note from Bill Sacks 2016-02-23

Hi Ben & Rosie,

I noticed a bug in ed code in clm_driver on the trunk... not sure if this has been fixed on the ed branch:

   if (use_ed) then
      call ed_phenology_inst%accumulateAndExtract(bounds_proc, &
           temperature_inst%t_ref2m_patch(bounds_proc%begp:bounds_proc%endp), &
           patch%gridcell(bounds_proc%begp:bounds_proc%endp), &
           grc%latdeg(bounds_proc%begg:bounds_proc%endg), &
           mon, day, sec)
   endif

But mon, day and sec are never set.

Rosie commented:

Oh. That's interesting, & might explain why Stef kept seeing strange things with GDD0 (the phenology growing degree days 'counter').

bandre-ucar commented 8 years ago

Note from Sean Swenson regarding #74 but probably also relevant for this general restart issue:

I think that the ed restart value has to do with the fact that the ed_clm_link 
is called during the restart from EDRestVectorMod.  This seems to occur 
prior to the running of the biogeophysics.  In a normal timestep, the biogeophysics 
occurs prior to the ED calls.  But during a restart, the ed_clm_link routine is 
called, so calculates various things based on what's on the restart, not on 
what the value is after  biogeophysics.  Not sure what the proper fix is though. 
rosiealice commented 8 years ago

OK - I have something that I think fixes the snow issue. It involves moving the elai_profile and esai_profile calculations to SurfaceRadiation, such that they use the updated snow variables. This needs a new variable (layer_height_profile) to track the height of the 'iv' layers to compare against snow depth in surface radiation.

There is -lot- of code in surface radiation that I just made redundant, and some that already was (the smooth_leaf_distribution=1 option that isn't used, primarily). I kept that out of this commit though.

There are two updates. One for the main fix, and a second to clean up a couple of minor issues.

Ben, can you let me know if this works for you?

cross fingers, -Rosie

On 15 June 2016 at 14:27, Ben Andre notifications@github.com wrote:

Note from Sean Swenson regarding #74 https://github.com/NGEET/ed-clm/issues/74 but probably also relevant for this general restart issue:

I think that the ed restart value has to do with the fact that the ed_clm_link is called during the restart from EDRestVectorMod. This seems to occur prior to the running of the biogeophysics. In a normal timestep, the biogeophysics occurs prior to the ED calls. But during a restart, the ed_clm_link routine is called, so calculates various things based on what's on the restart, not on what the value is after biogeophysics. Not sure what the proper fix is though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NGEET/ed-clm/issues/14#issuecomment-226309744, or mute the thread https://github.com/notifications/unsubscribe/AMWsQ5ll0332MfpG0fZOPL7GxuSNl7rvks5qMGBIgaJpZM4HYqkz .


Dr Rosie A. Fisher

Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706

http://www.cgd.ucar.edu/staff/rfisher/

rosiealice commented 8 years ago

there's also a good change my changes will mess up the history outputting of ELAI for the canopy. That should be an issue to fix somewhere.

On 15 June 2016 at 16:47, rosie fisher rosieafisher@googlemail.com wrote:

OK - I have something that I think fixes the snow issue. It involves moving the elai_profile and esai_profile calculations to SurfaceRadiation, such that they use the updated snow variables. This needs a new variable (layer_height_profile) to track the height of the 'iv' layers to compare against snow depth in surface radiation.

There is -lot- of code in surface radiation that I just made redundant, and some that already was (the smooth_leaf_distribution=1 option that isn't used, primarily). I kept that out of this commit though.

There are two updates. One for the main fix, and a second to clean up a couple of minor issues.

Ben, can you let me know if this works for you?

cross fingers, -Rosie

On 15 June 2016 at 14:27, Ben Andre notifications@github.com wrote:

Note from Sean Swenson regarding #74 https://github.com/NGEET/ed-clm/issues/74 but probably also relevant for this general restart issue:

I think that the ed restart value has to do with the fact that the ed_clm_link is called during the restart from EDRestVectorMod. This seems to occur prior to the running of the biogeophysics. In a normal timestep, the biogeophysics occurs prior to the ED calls. But during a restart, the ed_clm_link routine is called, so calculates various things based on what's on the restart, not on what the value is after biogeophysics. Not sure what the proper fix is though.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/NGEET/ed-clm/issues/14#issuecomment-226309744, or mute the thread https://github.com/notifications/unsubscribe/AMWsQ5ll0332MfpG0fZOPL7GxuSNl7rvks5qMGBIgaJpZM4HYqkz .


Dr Rosie A. Fisher

Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706

http://www.cgd.ucar.edu/staff/rfisher/


Dr Rosie A. Fisher

Terrestrial Sciences Section Climate and Global Dynamics National Center for Atmospheric Research 1850 Table Mesa Drive Boulder, Colorado, 80305 USA. +1 303-497-1706

http://www.cgd.ucar.edu/staff/rfisher/