geoschem / geos-chem

GEOS-Chem "Science Codebase" repository. Contains GEOS-Chem science routines, run directory generation scripts, and interface code. This repository is used as a submodule within the GCClassic and GCHP wrappers, as well as in other modeling contexts (external ESMs).
http://geos-chem.org
Other
167 stars 162 forks source link

[FEATURE REQUEST] Remove useless computations from KPP rate law library functions #598

Closed yantosca closed 3 years ago

yantosca commented 3 years ago

The problem

This topic was introduced in discussion thread #567. Long story short, the existing KPP equations and rate-law libraries use REAL*4 constants (e.g. 1.0 or 1.0E+00) that are cast to double precision. Not only does this involve a loss of precision, but this involves a lot of useless CPU cycles.

For example, this equation (in fullchem.eqn) calls function GCARR (in gckpp.kpp) to compute the reaction rate according tot he Arrhenius law:

O3 + NO = NO2 + O2 :                         GCARR(3.00E-12, 0.0E+00, -1500.0); 

where function GCARR looks like this:

  REAL(kind=dp) FUNCTION GCARR( A0,B0,C0 )
      REAL A0,B0,C0 
      GCARR =  DBLE(A0) * EXP(DBLE(C0)/TEMP) * (300._dp/TEMP)**DBLE(B0)
  END FUNCTION GCARR   

Problems with this approach:

The solution

We have rewritten the rate law functions so that

So we can now write the reaction rate like this:

O3 + NO = NO2 + O2 :                         GCARR_ac(3.00d-12, -1500.0d0);

where:

  FUNCTION GCARR_ac( a0, c0 ) RESULT( rate )
    ! Arrhenius function, skipping computation of ( 300/T )**b0,
    ! which evaluates to 1 when b0=0.  This avoids excess CPU
    ! cycles (bmy, 12/18/20)
    !
    REAL(kind=dp), INTENT(IN) :: a0, c0
    REAL(kind=dp)             :: rate
    !
    rate = a0 * EXP( c0 / TEMP )
  END FUNCTION GCARR_ac

We have also renamed the function to denote which parameters are being passed: _ac means that it accept the Arrhenius A0 and C0 terms, and B0 is zero.

yantosca commented 3 years ago

For some background, here is a slide showing the relative cost of various floating point operations (normalized to +, which only takes a couple of CPU cycles). 2021-03-10 Bob Yantosca KPP

yantosca commented 3 years ago

I used the TAU Profiler to assess the performance of the KPP rate law functions with useless computations removed vs. the dev/13.0.0-final branch. Results are below.

Runs

Ref: refrun_7d_omp3 (Existing rate-law functions)

Job statistics

SLURM JobID #         : 16102823
Job Name              : geoschem.run
Submit time           : 2021-02-05 06:38:23
Start  time           : 2021-02-05 06:38:31
End    time           : 2021-02-05 07:26:57
Partition             : huce_casc+
Node                  : holy7c20409
CPUs                  : 48
Memory                : 0.0364 GB
CPU  Time             : 1-09:26:31  (     120391 s)
Wall Time             : 00:48:26    (       2906 s)
CPU  Time / Wall Time : 41.4284     ( 86.31% ideal)

GEOS-Chem timers

  Ref Timer name                   DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-00:46:28.078          2788.078
  Initialization                :  00-00:01:01.187            61.188
  HEMCO                         :  00-00:08:04.765           484.766
  All chemistry                 :  00-00:19:17.046          1157.047
  => FlexChem                   :  00-00:13:45.171           825.172
    -> FlexChem loop            :  00-00:13:26.093           806.094
    -> Init KPP                 :  00-00:00:03.647             3.648
    -> Het chem rates           :  00-00:00:49.864            49.865
    -> Photolysis rates         :  00-00:00:04.626             4.626
    -> KPP                      :  00-00:08:15.338           495.338
       RCONST                   :  00-00:01:07.659            67.660
       Integrate 1              :  00-00:06:49.706           409.707
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:03.707             3.707
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:01:20.468            80.469
  => Aerosol chem               :  00-00:03:56.140           236.141
  => Linearized strat chem      :  00-00:00:04.468             4.469
  Transport                     :  00-00:04:25.656           265.656
  Convection                    :  00-00:02:43.203           163.203
  Boundary layer mixing         :  00-00:04:14.359           254.359
  Dry deposition                :  00-00:00:06.968             6.969
  Wet deposition                :  00-00:01:09.765            69.766
  All diagnostics               :  00-00:05:24.921           324.922
  => HEMCO diagnostics          :  00-00:00:00.015             0.016
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:05:24.890           324.891
  Unit conversions              :  00-00:01:37.671            97.672
  Input                         :  00-00:00:29.671            29.672
  Output                        :  00-00:05:24.734           324.734
  Finalization                  :  00-00:00:07.671             7.672

Dev: devrun_7d_omp3 (rate law functions removing useless computations)

Job statistics from SLURM

SLURM JobID #         : 16119635
Job Name              : geoschem.run
Submit time           : 2021-02-05 07:55:53
Start  time           : 2021-02-05 07:56:31
End    time           : 2021-02-05 08:55:30
Partition             : huce_casc+
Node                  : holy7c20210
CPUs                  : 48
Memory                : 0.0346 GB
CPU  Time             : 1-14:21:24  (     138084 s)
Wall Time             : 00:58:59    (       3539 s)
CPU  Time / Wall Time : 39.0178     ( 81.29% ideal)

GEOS-Chem timers

  Dev Timer name                   DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-00:56:28.078          3388.078
  Initialization                :  00-00:01:00.906            60.906
  HEMCO                         :  00-00:08:03.578           483.578
  All chemistry                 :  00-00:29:03.343          1743.344
  => FlexChem                   :  00-00:22:26.578          1346.578
    -> FlexChem loop            :  00-00:22:06.046          1326.047
    -> Init KPP                 :  00-00:00:06.705             6.706
    -> Het chem rates           :  00-00:00:45.867            45.868
    -> Photolysis rates         :  00-00:00:04.769             4.769
    -> KPP                      :  00-**:45:52.***    **************
       RCONST                   :  00-00:00:33.606            33.606
       Integrate 1              :  00-00:12:20.858           740.859
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:03.694             3.695
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:01:21.781            81.781
  => Aerosol chem               :  00-00:04:58.046           298.047
  => Linearized strat chem      :  00-00:00:05.515             5.516
  Transport                     :  00-00:04:28.328           268.328
  Convection                    :  00-00:02:44.562           164.562
  Boundary layer mixing         :  00-00:04:19.375           259.375
  Dry deposition                :  00-00:00:06.531             6.531
  Wet deposition                :  00-00:01:12.437            72.438
  All diagnostics               :  00-00:05:27.953           327.953
  => HEMCO diagnostics          :  >>>>> THE TIMER DID NOT RUN <<<<<
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:05:27.953           327.953
  Unit conversions              :  00-00:01:42.906           102.906
  Input                         :  00-00:00:29.203            29.203
  Output                        :  00-00:05:27.812           327.812
  Finalization                  :  00-00:00:07.734             7.734

NOTE: Bug in placement of KPP timer call, will be fixed in next version

Diagnostic comparison

OH Metrics

Metric Ref Dev
Mass-weighted mean OH concentration 10.90705577712 x 10^5 molec cm-3 10.90701569309 x 10^5 molec cm-3
CH3CCl3 lifetime w/r/t tropospheric OH 5.7756 years 5.7757 years
CH4 lifetime w/r/t tropospheric OH 9.7264 years 9.7264 years

SpeciesConc

7-day mean species concentration, summed over all grid boxes

Variable               Ref=refrun_7d_omp3   Dev=devrun_7d_omp3   Dev-Ref
SpeciesConc_A3O2     : 1.5073348e-09      | 1.5073339e-09      | -8.881784197001252e-16 
SpeciesConc_ACET     : 0.00017001003      | 0.00017001003      | 0.0 
SpeciesConc_ACTA     : 4.8985858e-06      | 4.898587e-06       | 1.3642420526593924e-12 
SpeciesConc_AERI     : 6.854665e-08       | 6.854668e-08       | 3.552713678800501e-14 
SpeciesConc_ALD2     : 8.112423e-06       | 8.112424e-06       | 1.8189894035458565e-12 
SpeciesConc_ALK4     : 1.9099514e-05      | 1.9099509e-05      | -5.4569682106375694e-12 
SpeciesConc_ATO2     : 8.129286e-09       | 8.129294e-09       | 7.993605777301127e-15 
SpeciesConc_ATOOH    : 2.2103636e-06      | 2.2103638e-06      | 2.2737367544323206e-13 
SpeciesConc_B3O2     : 5.4175526e-09      | 5.4175473e-09      | -5.329070518200751e-15 
SpeciesConc_BCPI     : 6.386458e-06       | 6.386458e-06       | 0.0 
SpeciesConc_BCPO     : 1.6477707e-06      | 1.6477707e-06      | 0.0 
SpeciesConc_BENZ     : 7.721492e-06       | 7.721494e-06       | 1.8189894035458565e-12 
SpeciesConc_BRO2     : 1.6369257e-09      | 1.6369248e-09      | -8.881784197001252e-16 
SpeciesConc_Br       : 6.5420386e-07      | 6.5420386e-07      | 0.0 
SpeciesConc_Br2      : 4.294419e-08       | 4.2943796e-08      | -3.943512183468556e-13 
SpeciesConc_BrCl     : 2.9278328e-07      | 2.9278195e-07      | -1.3358203432289883e-12 
SpeciesConc_BrNO2    : 3.2915843e-09      | 3.291584e-09       | -4.440892098500626e-16 
SpeciesConc_BrNO3    : 4.2179417e-07      | 4.217947e-07       | 5.400124791776761e-13 
SpeciesConc_BrO      : 9.697598e-07       | 9.697593e-07       | -4.547473508864641e-13 
SpeciesConc_BrSALA   : 1.294285e-08       | 1.2943e-08         | 1.4921397450962104e-13 
SpeciesConc_BrSALC   : 1.02111954e-07     | 1.02112246e-07     | 2.913225216616411e-13 
SpeciesConc_C2H6     : 9.7722106e-05      | 9.772209e-05       | -1.4551915228366852e-11 
SpeciesConc_C3H8     : 1.24340195e-05     | 1.2434018e-05      | -1.8189894035458565e-12 
SpeciesConc_C4HVP1   : 1.344095e-11       | 1.344092e-11       | -2.949029909160572e-17 
SpeciesConc_C4HVP2   : 4.4018376e-11      | 4.401823e-11       | -1.457167719820518e-16 
SpeciesConc_CCl4     : 9.928283e-06       | 9.928283e-06       | 0.0 
SpeciesConc_CFC11    : 2.8912835e-05      | 2.8912835e-05      | 0.0 
SpeciesConc_CFC113   : 9.5496935e-06      | 9.5496935e-06      | 0.0 
SpeciesConc_CFC114   : 2.5807567e-06      | 2.5807567e-06      | 0.0 
SpeciesConc_CFC115   : 1.8629286e-06      | 1.8629286e-06      | 0.0 
SpeciesConc_CFC12    : 7.1108894e-05      | 7.1108894e-05      | 0.0 
SpeciesConc_CH2Br2   : 9.480521e-08       | 9.4805195e-08      | -1.4210854715202004e-14 
SpeciesConc_CH2Cl2   : 4.069369e-06       | 4.069369e-06       | 0.0 
SpeciesConc_CH2I2    : 1.6246651e-10      | 1.6246651e-10      | 0.0 
SpeciesConc_CH2IBr   : 3.5140446e-10      | 3.514045e-10       | 2.7755575615628914e-17 
SpeciesConc_CH2ICl   : 2.516635e-09       | 2.516635e-09       | 0.0 
SpeciesConc_CH2O     : 3.10896e-05        | 3.10896e-05        | 0.0 
SpeciesConc_CH2OO    : 2.910663e-14       | 2.9106617e-14      | -1.3552527156068805e-20 
SpeciesConc_CH3Br    : 8.0297417e-07      | 8.0297417e-07      | 0.0 
SpeciesConc_CH3CCl3  : 2.6677307e-07      | 2.6677307e-07      | 0.0 
SpeciesConc_CH3CHOO  : 2.6855886e-13      | 2.6855905e-13      | 1.8973538018496328e-19 
SpeciesConc_CH3Cl    : 7.0313225e-05      | 7.0313225e-05      | 0.0 
SpeciesConc_CH3I     : 3.921293e-08       | 3.9212928e-08      | -3.552713678800501e-15 
SpeciesConc_CH4      : 0.29982316         | 0.29982322         | 5.960464477539063e-08 
SpeciesConc_CHBr3    : 7.739167e-08       | 7.739167e-08       | 0.0 
SpeciesConc_CHCl3    : 1.1112453e-06      | 1.1112453e-06      | 0.0 
SpeciesConc_CLOCK    : 1811767900000.0    | 1811767900000.0    | 0.0 
SpeciesConc_CO       : 0.010680627        | 0.010680627        | 0.0 
SpeciesConc_CO2      : 3.4490836e-06      | 3.4490793e-06      | -4.320099833421409e-12 
SpeciesConc_Cl       : 2.5359993e-06      | 2.5359993e-06      | 0.0 
SpeciesConc_Cl2      : 1.775434e-06       | 1.775439e-06       | 5.002220859751105e-12 
SpeciesConc_Cl2O2    : 2.1621177e-06      | 2.1621074e-06      | -1.0231815394945443e-11 
SpeciesConc_ClNO2    : 7.3256534e-09      | 7.3256556e-09      | 2.220446049250313e-15 
SpeciesConc_ClNO3    : 2.739727e-05       | 2.7397316e-05      | 4.729372449219227e-11 
SpeciesConc_ClO      : 1.3318807e-05      | 1.3318772e-05      | -3.456079866737127e-11 
SpeciesConc_ClOO     : 6.488198e-11       | 6.488199e-11       | 6.938893903907228e-18 
SpeciesConc_DMS      : 6.1525543e-06      | 6.152555e-06       | 9.094947017729282e-13 
SpeciesConc_DST1     : 9.1736416e-05      | 9.1736416e-05      | 0.0 
SpeciesConc_DST2     : 9.420402e-05       | 9.420402e-05       | 0.0 
SpeciesConc_DST3     : 0.0001495286       | 0.0001495286       | 0.0 
SpeciesConc_DST4     : 0.00010904716      | 0.00010904716      | 0.0 
SpeciesConc_EOH      : 3.3119904e-06      | 3.31199e-06        | -4.547473508864641e-13 
SpeciesConc_ETHLN    : 7.680528e-09       | 7.680507e-09       | -2.1316282072803006e-14 
SpeciesConc_ETNO3    : 2.5880027e-07      | 2.5880036e-07      | 8.526512829121202e-14 
SpeciesConc_ETO2     : 6.885982e-09       | 6.8859793e-09      | -2.6645352591003757e-15 
SpeciesConc_ETP      : 1.5796717e-06      | 1.5796718e-06      | 1.1368683772161603e-13 
SpeciesConc_GLYC     : 1.0556133e-06      | 1.055613e-06       | -3.410605131648481e-13 
SpeciesConc_GLYX     : 7.744487e-08       | 7.7444824e-08      | -4.263256414560601e-14 
SpeciesConc_H        : 2.4845294e-08      | 2.4845296e-08      | 1.7763568394002505e-15 
SpeciesConc_H1211    : 4.1805941e-07      | 4.1805941e-07      | 0.0 
SpeciesConc_H1301    : 4.356534e-07       | 4.356534e-07       | 0.0 
SpeciesConc_H2       : 0.09770402         | 0.09770402         | 0.0 
SpeciesConc_H2402    : 4.9276956e-08      | 4.9276956e-08      | 0.0 
SpeciesConc_H2O      : 605.02454          | 605.02454          | 0.0 
SpeciesConc_H2O2     : 5.0814597e-05      | 5.0814626e-05      | 2.9103830456733704e-11 
SpeciesConc_HAC      : 1.9350687e-06      | 1.9350684e-06      | -2.2737367544323206e-13 
SpeciesConc_HBr      : 1.2455374e-07      | 1.2455334e-07      | -3.979039320256561e-13 
SpeciesConc_HC5A     : 3.8371372e-08      | 3.837134e-08       | -3.197442310920451e-14 
SpeciesConc_HCFC123  : 1.1198907e-26      | 1.1198907e-26      | 0.0 
SpeciesConc_HCFC141b : 3.323583e-06       | 3.323583e-06       | 0.0 
SpeciesConc_HCFC142b : 3.855489e-06       | 3.855489e-06       | 0.0 
SpeciesConc_HCFC22   : 4.079312e-05       | 4.079312e-05       | 0.0 
SpeciesConc_HCOOH    : 3.943688e-06       | 3.9436895e-06      | 1.3642420526593924e-12 
SpeciesConc_HCl      : 0.0002608105       | 0.00026081048      | -2.9103830456733704e-11 
SpeciesConc_HI       : 1.3974245e-09      | 1.3974212e-09      | -3.3306690738754696e-15 
SpeciesConc_HMHP     : 5.5293265e-07      | 5.5293276e-07      | 1.1368683772161603e-13 
SpeciesConc_HMML     : 1.2449348e-07      | 1.2449358e-07      | 9.947598300641403e-14 
SpeciesConc_HNO2     : 2.085627e-07       | 2.085624e-07       | -2.984279490192421e-13 
SpeciesConc_HNO3     : 0.00027082945      | 0.00027082904      | -4.0745362639427185e-10 
SpeciesConc_HNO4     : 5.250747e-06       | 5.2507426e-06      | -4.547473508864641e-12 
SpeciesConc_HO2      : 3.4715563e-06      | 3.4715567e-06      | 4.547473508864641e-13 
SpeciesConc_HOBr     : 1.856996e-07       | 1.8569973e-07      | 1.2789769243681803e-13 
SpeciesConc_HOCl     : 4.992537e-06       | 4.99258e-06        | 4.320099833421409e-11 
SpeciesConc_HOI      : 3.709452e-08       | 3.7094598e-08      | 7.815970093361102e-14 
SpeciesConc_HONIT    : 1.5436838e-08      | 1.5436841e-08      | 3.552713678800501e-15 
SpeciesConc_HPALD1   : 1.7814134e-08      | 1.7814141e-08      | 7.105427357601002e-15 
SpeciesConc_HPALD1OO : 2.9613336e-11      | 2.9613333e-11      | -3.469446951953614e-18 
SpeciesConc_HPALD2   : 5.7168e-08         | 5.7167984e-08      | -1.4210854715202004e-14 
SpeciesConc_HPALD2OO : 1.0042704e-10      | 1.0042699e-10      | -4.85722573273506e-17 
SpeciesConc_HPALD3   : 1.505839e-08       | 1.5058392e-08      | 1.7763568394002505e-15 
SpeciesConc_HPALD4   : 4.144201e-08       | 4.1441997e-08      | -1.4210854715202004e-14 
SpeciesConc_HPETHNL  : 4.150748e-08       | 4.150747e-08       | -1.0658141036401503e-14 
SpeciesConc_I        : 1.631796e-08       | 1.6317959e-08      | -1.7763568394002505e-15 
SpeciesConc_I2       : 1.3201323e-10      | 1.3201329e-10      | 5.551115123125783e-17 
SpeciesConc_I2O2     : 8.249814e-11       | 8.2498466e-11      | 3.2612801348363973e-16 
SpeciesConc_I2O3     : 4.186887e-10       | 4.1868667e-10      | -2.0261570199409107e-15 
SpeciesConc_I2O4     : 2.2240976e-11      | 2.2239868e-11      | -1.1084883011491797e-15 
SpeciesConc_IBr      : 1.0656108e-09      | 1.0655772e-09      | -3.3639757646142243e-14 
SpeciesConc_ICHE     : 9.5115965e-08      | 9.5115944e-08      | -2.1316282072803006e-14 
SpeciesConc_ICHOO    : 6.850254e-12       | 6.8502578e-12      | 3.903127820947816e-18 
SpeciesConc_ICN      : 5.8332816e-08      | 5.8332756e-08      | -6.039613253960852e-14 
SpeciesConc_ICNOO    : 2.206408e-10       | 2.206401e-10       | -6.938893903907228e-16 
SpeciesConc_ICPDH    : 2.1435431e-08      | 2.1435437e-08      | 5.329070518200751e-15 
SpeciesConc_ICl      : 8.292727e-09       | 8.292725e-09       | -2.6645352591003757e-15 
SpeciesConc_IDC      : 4.5451184e-08      | 4.5451163e-08      | -2.1316282072803006e-14 
SpeciesConc_IDCHP    : 1.31170514e-08     | 1.31170514e-08     | 0.0 
SpeciesConc_IDHDP    : 4.533032e-08       | 4.5330314e-08      | -7.105427357601002e-15 
SpeciesConc_IDHNBOO  : 4.5150745e-09      | 4.515092e-09       | 1.7319479184152442e-14 
SpeciesConc_IDHNDOO1 : 7.747121e-12       | 7.747132e-12       | 1.1275702593849246e-17 
SpeciesConc_IDHNDOO2 : 7.789414e-12       | 7.789416e-12       | 1.734723475976807e-18 
SpeciesConc_IDHPE    : 2.079703e-07       | 2.0797029e-07      | -1.4210854715202004e-14 
SpeciesConc_IDN      : 1.1843922e-08      | 1.1843837e-08      | -8.526512829121202e-14 
SpeciesConc_IDNOO    : 6.687698e-12       | 6.687642e-12       | -5.594483210025203e-17 
SpeciesConc_IEPOXA   : 6.938548e-07       | 6.9385493e-07      | 1.1368683772161603e-13 
SpeciesConc_IEPOXAOO : 3.6544327e-11      | 3.6544316e-11      | -1.0408340855860843e-17 
SpeciesConc_IEPOXB   : 3.9932186e-07      | 3.9932195e-07      | 8.526512829121202e-14 
SpeciesConc_IEPOXBOO : 1.4625547e-11      | 1.4625542e-11      | -5.204170427930421e-18 
SpeciesConc_IEPOXD   : 3.5711096e-08      | 3.5711086e-08      | -1.0658141036401503e-14 
SpeciesConc_IHN1     : 1.11093215e-08     | 1.1109301e-08      | -2.042810365310288e-14 
SpeciesConc_IHN2     : 1.676428e-08       | 1.6764258e-08      | -2.3092638912203256e-14 
SpeciesConc_IHN3     : 1.48638195e-08     | 1.4863826e-08      | 6.217248937900877e-15 
SpeciesConc_IHN4     : 2.7609388e-09      | 2.7609341e-09      | -4.6629367034256575e-15 
SpeciesConc_IHOO1    : 9.202143e-09       | 9.202142e-09       | -8.881784197001252e-16 
SpeciesConc_IHOO4    : 2.5301403e-09      | 2.5301414e-09      | 1.1102230246251565e-15 
SpeciesConc_IHPNBOO  : 5.6970834e-12      | 5.697081e-12       | -2.6020852139652106e-18 
SpeciesConc_IHPNDOO  : 2.2034603e-11      | 2.2034577e-11      | -2.6020852139652106e-17 
SpeciesConc_IHPOO1   : 7.9493884e-11      | 7.9493856e-11      | -2.7755575615628914e-17 
SpeciesConc_IHPOO2   : 1.936254e-11       | 1.9362548e-11      | 8.673617379884035e-18 
SpeciesConc_IHPOO3   : 1.1914021e-10      | 1.1914024e-10      | 2.7755575615628914e-17 
SpeciesConc_INA      : 1.2893193e-17      | 1.28932146e-17     | 2.150669592637872e-23 
SpeciesConc_INDIOL   : 4.6126215e-06      | 4.6126215e-06      | 0.0 
SpeciesConc_INO      : 1.0768647e-11      | 1.0768593e-11      | -5.4643789493269423e-17 
SpeciesConc_INO2B    : 5.9356746e-09      | 5.935704e-09       | 2.930988785010413e-14 
SpeciesConc_INO2D    : 4.584526e-09       | 4.5845483e-09      | 2.220446049250313e-14 
SpeciesConc_INPB     : 2.0133172e-08      | 2.0133129e-08      | -4.263256414560601e-14 
SpeciesConc_INPD     : 2.221306e-08       | 2.2212992e-08      | -6.750155989720952e-14 
SpeciesConc_IO       : 1.8674838e-08      | 1.8674836e-08      | -1.7763568394002505e-15 
SpeciesConc_IONITA   : 4.583741e-08       | 4.5837393e-08      | -1.7763568394002505e-14 
SpeciesConc_IONO     : 8.5785895e-10      | 8.578429e-10       | -1.6042722705833512e-14 
SpeciesConc_IONO2    : 1.7657648e-08      | 1.7657669e-08      | 2.1316282072803006e-14 
SpeciesConc_IPRNO3   : 2.3726938e-07      | 2.3726943e-07      | 5.684341886080802e-14 
SpeciesConc_ISALA    : 7.671415e-09       | 7.671395e-09       | -1.9539925233402755e-14 
SpeciesConc_ISALC    : 4.689541e-09       | 4.6895337e-09      | -7.549516567451064e-15 
SpeciesConc_ISOP     : 2.8535464e-06      | 2.8535478e-06      | 1.3642420526593924e-12 
SpeciesConc_ISOPNOO1 : 2.8340656e-11      | 2.8340667e-11      | 1.0408340855860843e-17 
SpeciesConc_ISOPNOO2 : 2.5500598e-11      | 2.550063e-11       | 3.122502256758253e-17 
SpeciesConc_ITCN     : 1.5890372e-08      | 1.5890363e-08      | -8.881784197001252e-15 
SpeciesConc_ITHN     : 3.7235566e-08      | 3.723552e-08       | -4.618527782440651e-14 
SpeciesConc_KO2      : 1.1812668e-08      | 1.1812665e-08      | -3.552713678800501e-15 
SpeciesConc_LBRO2H   : 1.6923642e-06      | 1.6923643e-06      | 1.1368683772161603e-13 
SpeciesConc_LBRO2N   : 1.324323e-06       | 1.3243202e-06      | -2.7284841053187847e-12 
SpeciesConc_LCH4     : 1.829087e-06       | 1.8290865e-06      | -4.547473508864641e-13 
SpeciesConc_LCO      : 3.1677687e-06      | 3.1677641e-06      | -4.547473508864641e-12 
SpeciesConc_LIMO     : 1.0944953e-08      | 1.09449605e-08     | 7.105427357601002e-15 
SpeciesConc_LIMO2    : 2.03324e-10        | 2.0332393e-10      | -6.938893903907228e-17 
SpeciesConc_LISOPNO3 : 1.6943817e-08      | 1.6943789e-08      | -2.842170943040401e-14 
SpeciesConc_LISOPOH  : 1.939704e-07       | 1.9397034e-07      | -5.684341886080802e-14 
SpeciesConc_LNRO2H   : 3.1392792e-15      | 3.1392767e-15      | -2.541098841762901e-21 
SpeciesConc_LNRO2N   : 3.9500094e-15      | 3.95001e-15        | 4.235164736271502e-22 
SpeciesConc_LOx      : 0.0025104273       | 0.0025104273       | 0.0 
SpeciesConc_LTRO2H   : 9.306493e-07       | 9.3064915e-07      | -1.7053025658242404e-13 
SpeciesConc_LTRO2N   : 1.2669665e-06      | 1.2669663e-06      | -2.2737367544323206e-13 
SpeciesConc_LVOC     : 3.6051256e-10      | 3.605121e-10       | -4.718447854656915e-16 
SpeciesConc_LVOCOA   : 9.26511e-08        | 9.265106e-08       | -4.263256414560601e-14 
SpeciesConc_LXRO2H   : 3.6937772e-07      | 3.6937774e-07      | 2.842170943040401e-14 
SpeciesConc_LXRO2N   : 9.545257e-07       | 9.545257e-07       | 0.0 
SpeciesConc_MACR     : 1.1534095e-06      | 1.1534095e-06      | 0.0 
SpeciesConc_MACR1OO  : 6.082224e-10       | 6.0822153e-10      | -8.881784197001252e-16 
SpeciesConc_MACR1OOH : 4.9530666e-08      | 4.9530676e-08      | 1.0658141036401503e-14 
SpeciesConc_MACRNO2  : 3.3421742e-12      | 3.3421783e-12      | 4.119968255444917e-18 
SpeciesConc_MAP      : 5.2784267e-06      | 5.2784294e-06      | 2.7284841053187847e-12 
SpeciesConc_MCO3     : 1.9888374e-08      | 1.988839e-08       | 1.5987211554602254e-14 
SpeciesConc_MCRDH    : 7.481803e-08       | 7.4818054e-08      | 2.1316282072803006e-14 
SpeciesConc_MCRENOL  : 1.3894064e-08      | 1.389406e-08       | -3.552713678800501e-15 
SpeciesConc_MCRHN    : 2.227734e-09       | 2.2277358e-09      | 1.7763568394002505e-15 
SpeciesConc_MCRHNB   : 7.320646e-09       | 7.320592e-09       | -5.3734794391857577e-14 
SpeciesConc_MCRHP    : 6.1903194e-08      | 6.190318e-08       | -1.4210854715202004e-14 
SpeciesConc_MCROHOO  : 4.8687106e-11      | 4.86871e-11        | -6.938893903907228e-18 
SpeciesConc_MEK      : 8.744737e-06       | 8.74474e-06        | 2.7284841053187847e-12 
SpeciesConc_MENO3    : 5.5629187e-07      | 5.56292e-07        | 1.1368683772161603e-13 
SpeciesConc_MGLY     : 5.2363265e-07      | 5.236324e-07       | -2.2737367544323206e-13 
SpeciesConc_MO2      : 5.962356e-07       | 5.9623557e-07      | -5.684341886080802e-14 
SpeciesConc_MOH      : 4.2313055e-05      | 4.2313048e-05      | -7.275957614183426e-12 
SpeciesConc_MONITA   : 6.0623293e-09      | 6.0623284e-09      | -8.881784197001252e-16 
SpeciesConc_MONITS   : 1.3743057e-08      | 1.3743056e-08      | -1.7763568394002505e-15 
SpeciesConc_MONITU   : 1.8386385e-09      | 1.8386376e-09      | -8.881784197001252e-16 
SpeciesConc_MP       : 4.537546e-05       | 4.5375455e-05      | -3.637978807091713e-12 
SpeciesConc_MPAN     : 6.746832e-08       | 6.746838e-08       | 5.684341886080802e-14 
SpeciesConc_MPN      : 5.2643595e-06      | 5.2643554e-06      | -4.092726157978177e-12 
SpeciesConc_MSA      : 6.351092e-07       | 6.351092e-07       | 0.0 
SpeciesConc_MTPA     : 2.7747421e-07      | 2.7747433e-07      | 1.1368683772161603e-13 
SpeciesConc_MTPO     : 1.0865065e-07      | 1.0865068e-07      | 2.842170943040401e-14 
SpeciesConc_MVK      : 2.0747066e-06      | 2.0747057e-06      | -9.094947017729282e-13 
SpeciesConc_MVKDH    : 3.7804625e-07      | 3.7804637e-07      | 1.1368683772161603e-13 
SpeciesConc_MVKHC    : 1.2195021e-07      | 1.2195021e-07      | 0.0 
SpeciesConc_MVKHCB   : 1.04707624e-07     | 1.04707624e-07     | 0.0 
SpeciesConc_MVKHP    : 1.2856889e-07      | 1.2856889e-07      | 0.0 
SpeciesConc_MVKN     : 1.3970917e-08      | 1.39709195e-08     | 2.6645352591003757e-15 
SpeciesConc_MVKOHOO  : 5.166217e-09       | 5.1662137e-09      | -3.1086244689504383e-15 
SpeciesConc_MVKPC    : 3.7494566e-08      | 3.7494548e-08      | -1.7763568394002505e-14 
SpeciesConc_N        : 2.7872848e-10      | 2.787285e-10       | 2.7755575615628914e-17 
SpeciesConc_N2       : 152574.56          | 152574.56          | 0.0 
SpeciesConc_N2O      : 0.04729683         | 0.04729683         | 0.0 
SpeciesConc_N2O5     : 2.9692097e-05      | 2.9692108e-05      | 1.0913936421275139e-11 
SpeciesConc_NAP      : 5.0156987e-16      | 5.0157183e-16      | 1.9587636905255695e-21 
SpeciesConc_NH3      : 1.1689421e-05      | 1.1689383e-05      | -3.728928277269006e-11 
SpeciesConc_NH4      : 1.6433529e-05      | 1.643356e-05       | 3.092281986027956e-11 
SpeciesConc_NIT      : 7.3448487e-06      | 7.345165e-06       | 3.1650415621697903e-10 
SpeciesConc_NITs     : 1.6229379e-06      | 1.6229383e-06      | 3.410605131648481e-13 
SpeciesConc_NO       : 0.00038284084      | 0.00038284084      | 0.0 
SpeciesConc_NO2      : 0.00034411164      | 0.00034411164      | 0.0 
SpeciesConc_NO3      : 2.2070844e-06      | 2.2070851e-06      | 6.821210263296962e-13 
SpeciesConc_NPRNO3   : 6.073425e-08       | 6.0734266e-08      | 1.4210854715202004e-14 
SpeciesConc_NRO2     : 2.254625e-16       | 2.2546283e-16      | 3.441071348220595e-22 
SpeciesConc_O        : 0.0005328089       | 0.0005328089       | 0.0 
SpeciesConc_O1D      : 7.073275e-11       | 7.073275e-11       | 0.0 
SpeciesConc_O2       : 40937.98           | 40937.98           | 0.0 
SpeciesConc_O3       : 0.40913835         | 0.40913835         | 0.0 
SpeciesConc_OCPI     : 4.4510627e-05      | 4.4510627e-05      | 0.0 
SpeciesConc_OCPO     : 8.271655e-06       | 8.271655e-06       | 0.0 
SpeciesConc_OCS      : 6.511356e-05       | 6.511356e-05       | 0.0 
SpeciesConc_OClO     : 8.8407427e-07      | 8.840627e-07       | -1.1596057447604835e-11 
SpeciesConc_OH       : 4.173913e-06       | 4.1739136e-06      | 4.547473508864641e-13 
SpeciesConc_OIO      : 3.4973133e-09      | 3.4972856e-09      | -2.7755575615628914e-14 
SpeciesConc_OLND     : 1.9704007e-09      | 1.9704003e-09      | -4.440892098500626e-16 
SpeciesConc_OLNN     : 2.6162356e-10      | 2.6162353e-10      | -2.7755575615628914e-17 
SpeciesConc_OTHRO2   : 9.778939e-09       | 9.778926e-09       | -1.3322676295501878e-14 
SpeciesConc_PAN      : 8.04386e-06        | 8.043861e-06       | 9.094947017729282e-13 
SpeciesConc_PCO      : 2.694404e-06       | 2.6944035e-06      | -4.547473508864641e-13 
SpeciesConc_PH2O2    : 6.542564e-07       | 6.542566e-07       | 2.2737367544323206e-13 
SpeciesConc_PIO2     : 2.4084585e-09      | 2.4084572e-09      | -1.3322676295501878e-15 
SpeciesConc_PIP      : 7.6556034e-07      | 7.655602e-07       | -1.7053025658242404e-13 
SpeciesConc_PO2      : 3.5877883e-09      | 3.5877854e-09      | -2.886579864025407e-15 
SpeciesConc_POx      : 0.002511227        | 0.002511227        | 0.0 
SpeciesConc_PP       : 3.8874563e-07      | 3.8874572e-07      | 8.526512829121202e-14 
SpeciesConc_PPN      : 1.2329156e-06      | 1.2329157e-06      | 1.1368683772161603e-13 
SpeciesConc_PRN1     : 1.4278808e-09      | 1.4278839e-09      | 3.1086244689504383e-15 
SpeciesConc_PROPNN   : 8.439241e-08       | 8.439219e-08       | -2.2026824808563106e-13 
SpeciesConc_PRPE     : 1.5127832e-06      | 1.5127843e-06      | 1.0231815394945443e-12 
SpeciesConc_PRPN     : 4.423394e-09       | 4.423396e-09       | 1.7763568394002505e-15 
SpeciesConc_PSO4     : 9.988107e-09       | 9.9881055e-09      | -1.7763568394002505e-15 
SpeciesConc_PYAC     : 2.9168117e-08      | 2.9168087e-08      | -3.019806626980426e-14 
SpeciesConc_R4N1     : 1.1661354e-10      | 1.1661365e-10      | 1.1102230246251565e-16 
SpeciesConc_R4N2     : 2.1221092e-07      | 2.1221089e-07      | -2.842170943040401e-14 
SpeciesConc_R4O2     : 1.1953593e-08      | 1.1953577e-08      | -1.5987211554602254e-14 
SpeciesConc_R4P      : 6.364866e-07       | 6.36486e-07        | -6.252776074688882e-13 
SpeciesConc_RA3P     : 2.3240496e-07      | 2.3240509e-07      | 1.2789769243681803e-13 
SpeciesConc_RB3P     : 5.536681e-07       | 5.536684e-07       | 2.8421709430404007e-13 
SpeciesConc_RCHO     : 1.0937521e-06      | 1.0937524e-06      | 3.410605131648481e-13 
SpeciesConc_RCO3     : 2.0894324e-09      | 2.0894328e-09      | 4.440892098500626e-16 
SpeciesConc_RCOOH    : 1.9540794e-15      | 1.9540794e-15      | 0.0 
SpeciesConc_RIPA     : 6.1311675e-07      | 6.131165e-07       | -2.2737367544323206e-13 
SpeciesConc_RIPB     : 1.2943111e-07      | 1.294311e-07       | -1.4210854715202004e-14 
SpeciesConc_RIPC     : 1.8536946e-08      | 1.8536932e-08      | -1.4210854715202004e-14 
SpeciesConc_RIPD     : 7.929148e-09       | 7.929144e-09       | -4.440892098500626e-15 
SpeciesConc_ROH      : 7.780372e-08       | 7.78037e-08        | -2.1316282072803006e-14 
SpeciesConc_RP       : 4.449378e-07       | 4.4493785e-07      | 5.684341886080802e-14 
SpeciesConc_SALA     : 1.1958645e-05      | 1.1958645e-05      | 0.0 
SpeciesConc_SALAAL   : 9.312363e-08       | 9.312533e-08       | 1.6981971384666394e-12 
SpeciesConc_SALACL   : 2.8998e-06         | 2.8997981e-06      | -1.8189894035458565e-12 
SpeciesConc_SALC     : 0.00013495881      | 0.00013495881      | 0.0 
SpeciesConc_SALCAL   : 2.7237016e-05      | 2.7237009e-05      | -7.275957614183426e-12 
SpeciesConc_SALCCL   : 6.350825e-05       | 6.3508254e-05      | 7.275957614183426e-12 
SpeciesConc_SO2      : 8.231267e-06       | 8.231267e-06       | 0.0 
SpeciesConc_SO4      : 2.9016528e-05      | 2.9016524e-05      | -3.637978807091713e-12 
SpeciesConc_SO4H1    : 4.3056e-26         | 4.3056e-26         | 0.0 
SpeciesConc_SO4H2    : 4.3056e-26         | 4.3056e-26         | 0.0 
SpeciesConc_SO4H3    : 4.3056e-26         | 4.3056e-26         | 0.0 
SpeciesConc_SO4H4    : 4.3056e-26         | 4.3056e-26         | 0.0 
SpeciesConc_SO4s     : 3.3033018e-08      | 3.3033075e-08      | 5.684341886080802e-14 
SpeciesConc_SOAGX    : 3.8212812e-07      | 3.8212823e-07      | 1.1368683772161603e-13 
SpeciesConc_SOAIE    : 1.7106585e-06      | 1.7106582e-06      | -2.2737367544323206e-13 
SpeciesConc_SOAP     : 1.0303932e-06      | 1.0303932e-06      | 0.0 
SpeciesConc_SOAS     : 6.4829173e-06      | 6.4829173e-06      | 0.0 
SpeciesConc_TOLU     : 1.0836219e-06      | 1.083622e-06       | 1.1368683772161603e-13 
SpeciesConc_TRO2     : 9.080887e-10       | 9.0808744e-10      | -1.27675647831893e-15 
SpeciesConc_XRO2     : 4.183484e-10       | 4.183482e-10       | -2.220446049250313e-16 
SpeciesConc_XYLE     : 1.6607714e-07      | 1.660772e-07       | 5.684341886080802e-14 
SpeciesConc_pFe      : 1.3768651e-08      | 1.3768651e-08      | 0.0 

Differences indicate numerical noise (1e-6 or 1e-7 times lower than values) caused by the forcing of REAL*8 precision in KPP rate law functions. This is to be expected

KppDiags

7-day mean KPP statistics, summed over all grid boxes

Variable        Ref=refrun_7d_omp3    Dev=devrun_7d_omp3    Dev-Ref
AREA          : 510065600000000.0   | 510065600000000.0   | 0.0 
KppAccSteps   : 502781.97           | 502790.22           | 8.25 
KppIntCounts  : 1723891.1           | 1723937.6           | 46.5 
KppJacCounts  : 502781.97           | 502790.22           | 8.25 
KppLuDecomps  : 610554.44           | 610573.8            | 19.375 
KppRejSteps   : 728.83923           | 730.14294           | 1.3037109375 
KppSubsts     : 2442217.8           | 2442295.2           | 77.5 
KppTotSteps   : 610554.44           | 610573.8            | 19.375 

Plots

  1. KPP diagnostics, surface level
  2. Selected species, surface level
  3. Selected species, zonal mean

Summary

  1. Dev and Ref have very similar results in species concentrations (differing by numerical noise).
  2. The KPP diagnostics are very similar in Dev & Ref. This indicates that the KPP integrations are not being called more often in Dev than in Ref. It would also suggest that the load balancing is similar in both runs.
  3. KPP Integrate in Dev consistently takes about twice long as in Ref.
  4. KPP update rate constants (RCONST) is about half as long in Dev as in Ref.
  5. Why does chemistry take longer in Dev, even though the KPP diagnostics show the equivalent number of integration calls?
  6. The nodes I used have 24 CPUs with hyperthreading (48 logical cores but 24 physical cores). Could this be causing the differences that we are seeing?
  7. I also obtained similar results on the AWS cloud, where Dev took longer than Ref (but for simulations shorter than 7 days).

Thoughts? @msulprizio, @WilliamDowns, @sdeastham, @kilcolmu, @wporter, @davenhenze, @kelvinhb, @vshah, @LiamBindle

I think going forward I should restore the original rate law functions, as incurring a 9 minute/day penalty is not sustainable. Then using that we can start bringing other reactions from sulfate_mod etc. into KPP.

msulprizio commented 3 years ago

Tagging @kilicomu again for any thoughts (the username tagged above is incorrect)

sdeastham commented 3 years ago

Wow! This was a real surprise. I guess though that RCONST was not taking the majority of the integration time, so fixing the rate law calculations (and it does look like this substantially improved that part) was only going to be able to yield a minor improvement at best. Is it possible that the slowdown in integration is due to the forcing to double precision? Presumably there is potentially some knock-on effect for situations where all of the inputs, internal calculations, or even outputs are REAL*4, even if the calculation at the end is nominally REAL*8.

yantosca commented 3 years ago

I also have a ticket into FASRC to see if it is something to do with hyperthreading on the huce_cascade cluster. If you notice in Dev the RCONST timer is almost exactly a factor of 2 smaller than in Ref. That might have something to do with the hyperthreading.

@sdeastham: Even in the old code the output of the rate law functions was double precision (i.e.REAL(kind=dp) FUNCTION GCARR( A0,B0,C0 ). So I don't think that's it.

yantosca commented 3 years ago

For reference here are some articles about hyperthreading and OpenMP that I found online:

  1. https://stackoverflow.com/questions/36958661/openmp-dont-use-hyperthreading-cores-half-num-threads-w-hyperthreading
  2. https://stackoverflow.com/questions/35355944/how-does-hyperthreading-affect-parallelization
  3. https://stackoverflow.com/questions/4817789/hyper-threading-made-my-renderer-10-times-slower
  4. https://comp.programming.threads.narkive.com/bh6bxopg/openmp-on-hyper-threading

Also from Reference 4 above:

OpenMP and hyperthreading really don't go together. Your OpenMP compiler and runtime scheduler is highly unlikely to know enough about the sharing characteristics to do a good job of managing affinity and sharing. It's almost certainly treating your hyperthreaded CPU as a dual processor SMP; and that's a losing proposition because it's simply not.

wporter commented 3 years ago

This is interesting, and the hyperthreading does appear to be the most likely culprit. Could you rerun both cases with HT turned off?

yantosca commented 3 years ago

Plamen Krastev suggests:

Hi Bob,

I have your request and I will answer your previous questions separately. Regarding the OpenMP hyperthreading questions, you can try exporting the OMP_CPU_AFFINITY environment variable, e.g.,

export OMP_CPU_AFFINITY="0-23"

This should restrict the OMP execution to the available physical cores only. You also need to include in the job submission script:

srun -c 24 ...

and also export the OMP_NUM_THREADS environment variable, i.e.,

export OMP_NUM_THREADS=24

This should work, but please let me know if you continue to have problems with this. Have a nice weekend.

Best, Plamen

sdeastham commented 3 years ago

Interesting! If all else fails, compiling without OpenMP and doing serial tests may shine some light too.

yantosca commented 3 years ago

Rerunning with the new commands now. I wonder if this affects other studies that were done on huce_cascade, like the recent benchmark output (not for results but for timings).

yantosca commented 3 years ago

Also more info from Plamen at FAS Research computing:

You may also try

export OMP_PLACES=cores export OMP_NUM_THREADS=24

instead of

export OMP_CPU_AFFINITY="0-23" export OMP_NUM_THREADS=24>

to pin the threads to physical cores only, as this is a change in the OpenMP 5.0 spec.

yantosca commented 3 years ago

Also tagging @viral211 on this thread.

yantosca commented 3 years ago

I ran a couple more profiling runs with hyperthreading turned off on the huce_cascade partition on slurm (see discussion thread #614). These are the results from the GEOS-Chem timers, which simply measure the time elapsed across certain blocks of code:

Ref (dev/13.0.0-final development branch)
---------------------------------------------------------------------
  All chemistry                 :  00-00:32:04.015          1924.016
  => FlexChem                   :  00-00:23:13.593          1393.594
    -> FlexChem loop            :  00-00:22:42.015          1362.016
    -> Init KPP                 :  00-00:00:07.282             7.283
    -> Het chem rates           :  00-00:01:40.944           100.944
    -> Photolysis rates         :  00-00:00:10.691            10.691
    -> KPP                      :  00-00:16:31.617           991.618
       RCONST                   :  00-00:02:15.709           135.710   <-- update reaction rates
       Integrate 1              :  00-00:13:39.219           819.219   <-- call Rosenbrock solver
    -> Prod/loss diags          :  00-00:00:07.559             7.559

Dev (useless computations removed)
----------------------------------------------------------------------
  All chemistry                 :  00-00:44:23.209          2663.210
  => FlexChem                   :  00-00:35:51.850          2151.850
    -> FlexChem loop            :  00-00:35:22.292          2122.293
    -> Init KPP                 :  00-00:00:14.033            14.034
    -> Het chem rates           :  00-00:01:38.072            98.072
    -> Photolysis rates         :  00-00:00:11.080            11.080
    -> KPP                      :  00-00:27:24.559          1644.560
       RCONST                   :  00-00:01:10.356            70.357   <-- update reaction rates
       Integrate 1              :  00-00:25:39.289          1539.290   <-- call Rosenbrock solver
    -> Prod/loss diags          :  00-00:00:07.639             7.640

As you can see, we drastically reduce the time spent in computing reaction rates but basically double the time spent in the Rosenbrock solver. This is a consistent result that has happened across many of my tests. I will investigate what happens if I restore the original rate law functions.

Perhaps it is an issue in parallelization (in the Dev code there are more THREADPRIVATE variables included in the gckpp_Global code. That might be causing this issue.

yantosca commented 3 years ago

I was able to run simulations on our private node (Intel Haswell cores, no hyperthreading). In both cases the jobs were the only one running on the node.

Ref, dev/13.0.0-final branch, 24 cores

===============================================================================
G E O S - C H E M   T I M E R S

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-01:19:50.632          4790.633
  Initialization                :  00-00:00:55.480            55.480
  HEMCO                         :  00-00:09:15.957           555.957
  All chemistry                 :  00-00:40:35.640          2435.641
  => FlexChem                   :  00-00:29:02.304          1742.305
    -> FlexChem loop            :  00-00:28:22.804          1702.805
    -> Init KPP                 :  00-00:00:07.275             7.275
    -> Het chem rates           :  00-00:02:07.709           127.710
    -> Photolysis rates         :  00-00:00:08.967             8.967
    -> KPP                      :  00-00:20:55.183          1255.184
       RCONST                   :  00-00:02:41.072           161.072    <--- compute rxn rates
       Integrate 1              :  00-00:17:24.631          1044.631    <--- Rosenbrock solver
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:07.357             7.357
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:02:34.269           154.270
  => Aerosol chem               :  00-00:08:26.617           506.617
  => Linearized strat chem      :  00-00:00:09.480             9.480
  Transport                     :  00-00:07:22.527           442.527
  Convection                    :  00-00:04:15.332           255.332
  Boundary layer mixing         :  00-00:07:39.187           459.188
  Dry deposition                :  00-00:00:28.855            28.855
  Wet deposition                :  00-00:02:27.917           147.918
  All diagnostics               :  00-00:06:03.878           363.879
  => HEMCO diagnostics          :  00-00:00:00.023             0.023
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:06:03.859           363.859
  Unit conversions              :  00-00:03:24.289           204.289
  Input                         :  00-00:00:35.640            35.641
  Output                        :  00-00:06:03.535           363.535
  Finalization                  :  00-00:00:09.796             9.797

Dev code (feature/bmy/StreamlineKpp branch), 24 cores

===============================================================================
G E O S - C H E M   T I M E R S

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-01:36:55.019          5815.020
  Initialization                :  00-00:00:59.238            59.238
  HEMCO                         :  00-00:09:20.628           560.629
  All chemistry                 :  00-00:57:25.511          3445.512
  => FlexChem                   :  00-00:45:54.480          2754.480
    -> FlexChem loop            :  00-00:45:14.343          2714.344
    -> Init KPP                 :  00-00:00:16.211            16.211
    -> Het chem rates           :  00-00:02:04.679           124.680
    -> Photolysis rates         :  00-00:00:09.288             9.288
    -> KPP                      :  00-00:35:26.214          2126.214
       RCONST                   :  00-00:01:33.282            93.282    <--- compute rxn rates  
       Integrate 1              :  00-00:33:07.628          1987.629    <--- Rosenbrock solver
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:07.625             7.626
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:02:34.835           154.836
  => Aerosol chem               :  00-00:08:22.753           502.754
  => Linearized strat chem      :  00-00:00:10.128            10.129
  Transport                     :  00-00:07:19.382           439.383
  Convection                    :  00-00:04:16.531           256.531
  Boundary layer mixing         :  00-00:07:43.855           463.855
  Dry deposition                :  00-00:00:28.511            28.512
  Wet deposition                :  00-00:02:28.191           148.191
  All diagnostics               :  00-00:06:12.210           372.211
  => HEMCO diagnostics          :  00-00:00:00.007             0.008
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:06:12.218           372.219
  Unit conversions              :  00-00:03:30.085           210.086
  Input                         :  00-00:00:37.035            37.035
  Output                        :  00-00:06:11.894           371.895
  Finalization                  :  00-00:00:08.039             8.039

The Dev code is faster in computing the reaction rates, but this somehow causes the integration to take almost 2x as long. I am still investigating.

yantosca commented 3 years ago

I just discovered that the KPP code that I had built had an excess call to UPDATE_RCONST() within the gckpp_Integrator.F90 module. This should not have been there. We call UPDATE_RCONST() outside of the internal integration timestep loop... this is done from within the main loop in DO_FLEXCHEM before the Integrator is called. These excess calls to UPDATE_RCONST (one per internal integration iteration were slowing down the code and masking the gains obtained by streamlining the rate law functions.

I am doing a new profiling run with corrected code that should show the real impact of streamlining the rate law functions. I'll post the results here.

yantosca commented 3 years ago

Here are the latest timing results (7 model-day simulations) The Dev run has the excess calls to UPDATE_RCONST() removed:

Ref: (dev/13.0.0-final branch) huce_cascade, no hyperthreading, 24 cores

SLURM JobID #         : 16918083
Job Name              : geoschem.profile.run
Submit time           : 2021-02-09 17:53:37
Start  time           : 2021-02-09 17:55:01
End    time           : 2021-02-09 19:03:43
Partition             : huce_casc+
Node                  : holy7c20202
CPUs                  : 24
Memory                : 0.0357 GB
CPU  Time             : 1-00:53:51  (      89631 s)
Wall Time             : 01:08:42    (       4122 s)
CPU  Time / Wall Time : 21.7445     ( 90.60% ideal)

===============================================================================
G E O S - C H E M   T I M E R S

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-01:06:59.000          4019.000
  Initialization                :  00-00:00:48.953            48.953
  HEMCO                         :  00-00:08:04.515           484.516
  All chemistry                 :  00-00:32:04.015          1924.016
  => FlexChem                   :  00-00:23:13.593          1393.594
    -> FlexChem loop            :  00-00:22:42.015          1362.016
    -> Init KPP                 :  00-00:00:07.282             7.283
    -> Het chem rates           :  00-00:01:40.944           100.944
    -> Photolysis rates         :  00-00:00:10.691            10.691
    -> KPP                      :  00-00:16:31.617           991.618
       RCONST                   :  00-00:02:15.709           135.710   <-- Update rxn rates
       Integrate 1              :  00-00:13:39.219           819.219   <-- Call Rosenbrock solver
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:07.559             7.559
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:01:59.093           119.094
  => Aerosol chem               :  00-00:06:20.843           380.844
  => Linearized strat chem      :  00-00:00:05.609             5.609
  Transport                     :  00-00:06:23.078           383.078
  Convection                    :  00-00:03:59.750           239.750
  Boundary layer mixing         :  00-00:06:21.828           381.828
  Dry deposition                :  00-00:00:11.671            11.672
  Wet deposition                :  00-00:02:09.437           129.438
  All diagnostics               :  00-00:06:07.656           367.656
  => HEMCO diagnostics          :  00-00:00:00.031             0.031
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:06:07.625           367.625
  Unit conversions              :  00-00:03:54.375           234.375
  Input                         :  00-00:00:31.187            31.188
  Output                        :  00-00:06:07.484           367.484
  Finalization                  :  00-00:00:07.703             7.703

Dev (feature/bmy/StreamlineKPP branch, huce_cascade, no hyperthreading, 24 cores

SLURM JobID #         : 17043612
Job Name              : geoschem.profile.run
Submit time           : 2021-02-11 17:22:55
Start  time           : 2021-02-11 17:23:43
End    time           : 2021-02-11 18:27:53
Partition             : huce_casc+
Node                  : holy7c20612
CPUs                  : 24
Memory                : 0.0346 GB
CPU  Time             : 23:16:13    (      83773 s)
Wall Time             : 01:04:10    (       3850 s)
CPU  Time / Wall Time : 21.7592     ( 90.66% ideal)

===============================================================================
G E O S - C H E M   T I M E R S

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-01:02:23.320          3743.320
  Initialization                :  00-00:00:58.148            58.148
  HEMCO                         :  00-00:07:55.570           475.570
  All chemistry                 :  00-00:30:04.664          1804.664
  => FlexChem                   :  00-00:22:08.570          1328.570
    -> FlexChem loop            :  00-00:21:40.867          1300.867
    -> Init KPP                 :  00-00:00:12.893            12.893
    -> Het chem rates           :  00-00:01:34.672            94.673
    -> Photolysis rates         :  00-00:00:09.426             9.426
    -> KPP                      :  00-00:15:46.220           946.220
       RCONST                   :  00-00:01:15.690            75.691   <--- Update rxn rates
       Integrate 1              :  00-00:13:57.386           837.387   <--- Call Rosenbrock solver
       Integrate 2              :  >>>>> THE TIMER DID NOT RUN <<<<<
    -> Prod/loss diags          :  00-00:00:07.652             7.652
    -> OH reactivity diag       :  >>>>> THE TIMER DID NOT RUN <<<<<
  => FAST-JX photolysis         :  00-00:01:57.351           117.352
  => Aerosol chem               :  00-00:05:31.062           331.062
  => Linearized strat chem      :  00-00:00:05.460             5.461
  Transport                     :  00-00:06:03.039           363.039
  Convection                    :  00-00:03:40.093           220.094
  Boundary layer mixing         :  00-00:05:23.539           323.539
  Dry deposition                :  00-00:00:11.656            11.656
  Wet deposition                :  00-00:02:02.140           122.141
  All diagnostics               :  00-00:05:34.148           334.148
  => HEMCO diagnostics          :  >>>>> THE TIMER DID NOT RUN <<<<<
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:05:34.132           334.133
  Unit conversions              :  00-00:03:22.804           202.805
  Input                         :  00-00:00:28.593            28.594
  Output                        :  00-00:05:34.054           334.055

Analysis

The updates that were made to remove the useless computations in the KPP rate law library take about half as long as before (75 seconds vs. 135 seconds). But the overall time spent in the Rosenbrock solver does not change very much. This results in ~4.5 minutes of wall clock time savings for a 7-day run. This would be almost 20 minutes of time savings for a 1-month (31-day) benchmark simulation.

msulprizio commented 3 years ago

This feature request was addressed by https://github.com/geoschem/geos-chem/pull/663, which has been merged into the dev branch. This update will be included in 13.1.0.