NGEET / fates

repository for the Functionally Assembled Terrestrial Ecosystem Simulator (FATES)
Other
95 stars 91 forks source link

FATES timing #859

Open dlawrenncar opened 2 years ago

dlawrenncar commented 2 years ago

@wwieder ran some short test runs with CTSM5.1(SP) and CTSM5.1(FATES-SP) to look at differences in timing/performance which some previous examination had indicated could be significant. The case directories on Cheyenne are here:

/glade/u/home/wwieder/clm_tutorial_cases/I2000_CTSM51_sp /glade/u/home/wwieder/clm_tutorial_cases/I2000_CTSM_FATESsp

The timing files are here: CTSM(SP): /glade/u/home/wwieder/clm_tutorial_cases/I2000_CTSM51_sp/timing/cesm_timing.I2000_CTSM51_sp.3799628.chadmin1.ib0.cheyenne.ucar.edu.220417-062346

CTSM(FATES-SP) /glade/u/home/wwieder/clm_tutorial_cases/I2000_CTSM_FATESsp/timing/cesm_timing.I2000_CTSM_FATESsp.3799626.chadmin1.ib0.cheyenne.ucar.edu.220417-062320

It looks like the land model part is about 2x more expensive for CTSM(FATESSP) vs CTSM(SP). I see some possible reasons.

  1. Quite a bit of time is spent (almost 40% the total difference) in fates_wrap_update_hifrq_hist
  2. There are some BGC type routines that probably shouldn't be running (?): ch4, BGCZero, SoilBiogeochemLittVertTransp
  3. BGPFluxes takes about 30% longer
  4. Surfalbedo which includes FATES radiation call takes quite a bit longer (386s vs 88s)
  5. hbuf takes quite a bit longer (462s vs 158s)
dlawrenncar commented 2 years ago

The better files to look at are:

/glade/u/home/wwieder/clm_tutorial_cases/I2000_CTSM51_sp/timing/cesm.ESMF_Profile.summary.3799628.chadmin1.ib0.cheyenne.ucar.edu.220417-062346

/glade/u/home/wwieder/clm_tutorial_cases/I2000_CTSM_FATESsp/timing/cesm.ESMF_Profile.summary.3799626.chadmin1.ib0.cheyenne.ucar.edu.220417-062320

dlawrenncar commented 2 years ago

At FATES SE meeting, we decided to do three things.

  1. Turn off calls to BGC related routines (done in host land model, will file issue in CTSM repo).
  2. Examine the fates_wrap_update_hifrq_hist routine and reduce number of calculations done by default and look for other cost savings (@rgknox will file issue on this)
  3. Run a FATES prescribed biogeography nocomp simulation to examine timing in that configuration compared to CLMBGC. @dlawrenncar will do this.
ekluzek commented 2 years ago

The CTSM Soil BGC issue is here: https://github.com/ESCOMP/CTSM/issues/1720

glemieux commented 2 years ago

On a related note, after the discussion during the fates software meeting today, I ran a simple one year dynamic fates case on Cheyenne to make sure that we could call the perf_mod timings calls. I added sub-calls inside the update_history_hifrq subroutine around the patch, cohort and pft calculation loops: https://github.com/glemieux/fates/commit/e1e12d3255d9ec3b5456e44686a2084228bd4647

Here's are partial view of the output for relative comparison of the sub-calls to the fates_wrap_update_hifrq_hist call:

 9 Region                                                                     PETs   Count    Mean (s)    Min (s)     Min PET Max (s)     Max PET
 23           fates_wrap_update_hifrq_hist                                     144    17521    37.4300     18.8110     95      61.2650     51
 24             update_history_hifrq_patchloop                                 144    MULTIPLE 8.3190      4.3779      153     11.1495     45
 25               update_history_hifrq_cohortloop                              144    MULTIPLE 6.0209      2.7544      153     8.2941      45
 26               update_history_hifrq_pftloop                                 144    MULTIPLE 1.6703      1.0647      117     2.1641      78

Output:

/glade/scratch/glemieux/ctsm-cases/perfmodcheck.fates-sci.1.56.0_api.23.0.0-ctsm5.1.dev091-C82a63cc16-Fe1e12d32.intel/timing
ekluzek commented 2 years ago

You can definitely run with the timing calls for a CTSM case. What I'm less clear on is if E3SM has changed any of the interfaces so that you you wouldn't be able to build and run both CTSM and ELM. That is very unlikely though. So, if you could run the above changes through ELM that would be the test I'd really like to hear about. And likely all you need to check is if you can build.

ekluzek commented 2 years ago

I have a PR with a first pass at removing soil BGC. You might want to rerun the fates-sp case again using that branch.

https://github.com/ESCOMP/CTSM/pull/1723

I also suggest using the fates_sp user-mod so that it will set it up right and turn off a bunch of unneeded history variables.

dlawrenncar commented 2 years ago

I've now run some timing tests for a FATES fixed biogeography no comp simulation against a CTSM51(BGC) simulation. Timing files are here:

CTSM5.1(FATES-nocomp) /glade/u/home/dlawren/cases/FATES_nocomp_4x5test/timing/cesm.ESMF_Profile.summary.3997624.chadmin1.ib0.cheyenne.ucar.edu.220428-001432

CTSM5.1(BGC) no crops /glade/u/home/dlawren/cases/ctsm51bgc_4x5test/timing/cesm.ESMF_Profile.summary.3972115.chadmin1.ib0.cheyenne.ucar.edu.220426-101853

A few things to note:

  1. The cost seemed to spinup quickly. I ran in 20 year increments and by the 4th submission (80 years), the cost had equilibrated, increasing by about 25% from years 1 to 20 to years 61 to 80.
  2. After cost equilibrium, FATES is about 2.5x slower than CTSM(BGC) (lnd run 11000s FATES, 4500s CTSM).
  3. The sources of the difference in timing seem different to what we saw in the SP simulations.
  4. By far the biggest difference is in can_flux/can_iter where FATES took average of 6600s and CTSM took just 900s.
  5. fates_wrap_update_hifrq_hist, which was source of much of difference in SP mode is not as significant here (600s)
  6. As in SP, surfalb is 4-5x more expensive (500s vs 100s), as is surfrad (450s vs 50s)
  7. fates_dynamics_daily_driver is also an expensive part of the model (1100s), though if this equates to some degree to ecosysdyn in CTSM (???) then maybe this is not unexpected, ecosysdyn is 1200s.
  8. surfrad in FATES is 450
glemieux commented 2 years ago

You can definitely run with the timing calls for a CTSM case. What I'm less clear on is if E3SM has changed any of the interfaces so that you you wouldn't be able to build and run both CTSM and ELM. That is very unlikely though. So, if you could run the above changes through ELM that would be the test I'd really like to hear about. And likely all you need to check is if you can build.

I can confirm that we can call perf_mod in fates for elm as well. I was able to build and run a simple one year single site case using the current e3sm master branch:

        "l:CNPsum"                                                          -   17521    -       0.436420     0.000108     0.000019         0.001135
        "l:fates_wrap_update_hifrq_hist"                                    -   17521    -       4.105862     0.001709     0.000180         0.001135
          "l:update_history_hifrq_patchloop"                                -   17521    -       0.452173     0.000354     0.000019         0.001135
            "l:update_history_hifrq_cohortloop"                             -  210252    -       0.083103     0.000023     0.000000         0.013617
            "l:update_history_hifrq_pftloop"                                -  210252    -       0.195928     0.000046     0.000001         0.013617
        "l:balchk"                                                          -   17521    -       0.078643     0.000074     0.000003         0.001135