geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
168 stars 165 forks source link

[FEATURE REQUEST] Add more timers for gas-phase chemistry #240

Closed yantosca closed 4 years ago

yantosca commented 4 years ago

As per this plot in our Guide to GEOS-Chem performance, the gas-phase chemistry seems to be a computational bottleneck.

We can introduce more timers (e.g. for het chemistry) to try to better narrow down where the chemistry is spending its time. I am putting this as a feature request so that we don't forget to do this.

yantosca commented 4 years ago

This is a "quick hit" and we could do this for 12.9.0.

msulprizio commented 4 years ago

Additional timers in flexchem_mod.F90 are now included in the dev/12.8.1 branch. This update included adding the capability to start and end timers within parallel loops. The timer output at the end of the log file now looks like this:

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-00:01:41.312           101.312
  Initialization                :  00-00:00:43.000            43.000
  HEMCO                         :  00-00:00:41.640            41.641
  All chemistry                 :  00-00:00:38.546            38.547
  => FlexChem                   :  00-00:00:36.656            36.656
    -> FlexChem loop            :  00-00:00:36.453            36.453
    -> Init KPP                 :  00-00:00:00.515             0.516
    -> Het chem rates           :  00-00:00:10.562            10.562
    -> Photolysis rates         :  00-00:00:00.515             0.516
    -> KPP                      :  00-00:04:15.859           255.859
       RCONST                   :  00-00:00:07.812             7.812
       Integrate 1              :  00-00:04:05.484           245.484
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:00.500             0.500
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:00:00.640             0.641
  => Aerosol chem               :  00-00:00:01.015             1.016
  => Linearized strat chem      :  00-00:00:00.140             0.141
  Transport                     :  00-00:00:01.671             1.672
  Convection                    :  00-00:00:01.140             1.141
  Boundary layer mixing         :  00-00:00:01.000             1.000
  Dry deposition                :  00-00:00:00.125             0.125
  Wet deposition                :  00-00:00:00.781             0.781
  All diagnostics               :  00-00:00:03.375             3.375
  => HEMCO diagnostics          :  >>>>> THE TIMER DID NOT RUN <<<<<
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:00:03.375             3.375
  Unit conversions              :  00-00:00:00.671             0.672
  Input                         :  >>>>> THE TIMER DID NOT RUN <<<<<
  Output                        :  00-00:00:03.875             3.875

There seems to be an issue in the KPP timer output, which exceeds the total FlexChem, chemistry, and GEOS-Chem timer totals. This may be an issue related to the OMP parallelization. We are still investigating.

yantosca commented 4 years ago

With 1 processor I get this result:

  => FlexChem                   :  00-00:03:53.250           233.250
    -> FlexChem loop            :  00-00:03:52.750           232.750
    -> Init KPP                 :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Het chem rates           :  00-00:00:13.500            13.500
    -> Photolysis rates         :  00-00:00:00.250             0.250
    -> KPP                      :  00-00:03:39.000           219.000
       RCONST                   :  00-00:00:04.000             4.000
       Integrate 1              :  00-00:03:34.250           214.250
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<

which should be the true timing. Note that the timer results more or less add up. The timer is good to .25 sec I think.

If the chemistry takes about T seconds on each thread, and is being solved on N threads, then the total time reported by the timer in the parallel loop will be about N*T seconds. This is N times too high when compared to the "FlexChem" timer (which is outside of the parallel loop.

I think a quick fix is to report the average time instead of the summed time across all threads. In other words, in routine Timer_Sum_Loop, we should divide the total time across all threads by N threads. This will give an average amount of time per core, which is more in line with the other timers.

For example, on 2 threads I get:

  => FlexChem                   :  00-00:01:52.000           112.000
    -> FlexChem loop            :  00-00:01:51.750           111.750
    -> Init KPP                 :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Het chem rates           :  00-00:00:06.750             6.750
    -> Photolysis rates         :  00-00:00:00.375             0.375
    -> KPP                      :  00-00:01:43.625           103.625
       RCONST                   :  00-00:00:01.375             1.375
       Integrate 1              :  00-00:01:41.125           101.125

and on 6 threads I get:

  => FlexChem                   :  00-00:00:40.750            40.750
    -> FlexChem loop            :  00-00:00:40.500            40.500
    -> Init KPP                 :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Het chem rates           :  00-00:00:02.083             2.083
    -> Photolysis rates         :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> KPP                      :  00-00:00:37.458            37.458
       RCONST                   :  00-00:00:01.041             1.042
       Integrate 1              :  00-00:00:36.333            36.333

This won't be perfect (the times don't add up exactly) but it should be good enough for comparison. We can look at the % of time spent in each operation too. The system clock is somewhat coarse, as I said above.

We can also note the timers that we treat in this way by putting an asterisk in the timer name.

yantosca commented 4 years ago

This is now completed in commit https://github.com/geoschem/geos-chem/commit/0cfd663f98f9ecba1334b476d27009587d7f795c.