Closed yantosca closed 4 years ago
This is a "quick hit" and we could do this for 12.9.0.
Additional timers in flexchem_mod.F90 are now included in the dev/12.8.1
branch. This update included adding the capability to start and end timers within parallel loops. The timer output at the end of the log file now looks like this:
Timer name DD-hh:mm:ss.SSS Total Seconds
-------------------------------------------------------------------------------
GEOS-Chem : 00-00:01:41.312 101.312
Initialization : 00-00:00:43.000 43.000
HEMCO : 00-00:00:41.640 41.641
All chemistry : 00-00:00:38.546 38.547
=> FlexChem : 00-00:00:36.656 36.656
-> FlexChem loop : 00-00:00:36.453 36.453
-> Init KPP : 00-00:00:00.515 0.516
-> Het chem rates : 00-00:00:10.562 10.562
-> Photolysis rates : 00-00:00:00.515 0.516
-> KPP : 00-00:04:15.859 255.859
RCONST : 00-00:00:07.812 7.812
Integrate 1 : 00-00:04:05.484 245.484
Integrate 2 : >>>>> THE TIMER DID NOT RUN <<<<<
-> Prod/loss diags : 00-00:00:00.500 0.500
-> OH reactivity diag : >>>>> THE TIMER DID NOT RUN <<<<<
=> FAST-JX photolysis : 00-00:00:00.640 0.641
=> Aerosol chem : 00-00:00:01.015 1.016
=> Linearized strat chem : 00-00:00:00.140 0.141
Transport : 00-00:00:01.671 1.672
Convection : 00-00:00:01.140 1.141
Boundary layer mixing : 00-00:00:01.000 1.000
Dry deposition : 00-00:00:00.125 0.125
Wet deposition : 00-00:00:00.781 0.781
All diagnostics : 00-00:00:03.375 3.375
=> HEMCO diagnostics : >>>>> THE TIMER DID NOT RUN <<<<<
=> ObsPack diagnostics : >>>>> THE TIMER DID NOT RUN <<<<<
=> History (netCDF diags) : 00-00:00:03.375 3.375
Unit conversions : 00-00:00:00.671 0.672
Input : >>>>> THE TIMER DID NOT RUN <<<<<
Output : 00-00:00:03.875 3.875
There seems to be an issue in the KPP timer output, which exceeds the total FlexChem, chemistry, and GEOS-Chem timer totals. This may be an issue related to the OMP parallelization. We are still investigating.
With 1 processor I get this result:
=> FlexChem : 00-00:03:53.250 233.250
-> FlexChem loop : 00-00:03:52.750 232.750
-> Init KPP : >>>>> THE TIMER DID NOT RUN <<<<<
-> Het chem rates : 00-00:00:13.500 13.500
-> Photolysis rates : 00-00:00:00.250 0.250
-> KPP : 00-00:03:39.000 219.000
RCONST : 00-00:00:04.000 4.000
Integrate 1 : 00-00:03:34.250 214.250
Integrate 2 : >>>>> THE TIMER DID NOT RUN <<<<<
-> Prod/loss diags : >>>>> THE TIMER DID NOT RUN <<<<<
-> OH reactivity diag : >>>>> THE TIMER DID NOT RUN <<<<<
which should be the true timing. Note that the timer results more or less add up. The timer is good to .25 sec I think.
If the chemistry takes about T seconds on each thread, and is being solved on N threads, then the total time reported by the timer in the parallel loop will be about N*T seconds. This is N times too high when compared to the "FlexChem" timer (which is outside of the parallel loop.
I think a quick fix is to report the average time instead of the summed time across all threads. In other words, in routine Timer_Sum_Loop, we should divide the total time across all threads by N threads. This will give an average amount of time per core, which is more in line with the other timers.
For example, on 2 threads I get:
=> FlexChem : 00-00:01:52.000 112.000
-> FlexChem loop : 00-00:01:51.750 111.750
-> Init KPP : >>>>> THE TIMER DID NOT RUN <<<<<
-> Het chem rates : 00-00:00:06.750 6.750
-> Photolysis rates : 00-00:00:00.375 0.375
-> KPP : 00-00:01:43.625 103.625
RCONST : 00-00:00:01.375 1.375
Integrate 1 : 00-00:01:41.125 101.125
and on 6 threads I get:
=> FlexChem : 00-00:00:40.750 40.750
-> FlexChem loop : 00-00:00:40.500 40.500
-> Init KPP : >>>>> THE TIMER DID NOT RUN <<<<<
-> Het chem rates : 00-00:00:02.083 2.083
-> Photolysis rates : >>>>> THE TIMER DID NOT RUN <<<<<
-> KPP : 00-00:00:37.458 37.458
RCONST : 00-00:00:01.041 1.042
Integrate 1 : 00-00:00:36.333 36.333
This won't be perfect (the times don't add up exactly) but it should be good enough for comparison. We can look at the % of time spent in each operation too. The system clock is somewhat coarse, as I said above.
We can also note the timers that we treat in this way by putting an asterisk in the timer name.
This is now completed in commit https://github.com/geoschem/geos-chem/commit/0cfd663f98f9ecba1334b476d27009587d7f795c.
As per this plot in our Guide to GEOS-Chem performance, the gas-phase chemistry seems to be a computational bottleneck.
We can introduce more timers (e.g. for het chemistry) to try to better narrow down where the chemistry is spending its time. I am putting this as a feature request so that we don't forget to do this.