[FEATURE REQUEST] Only read necessary met fields to speed-up simulation s

JiaweiZhuang commented 4 years ago

I notice that CH4 simulations spend ~50% of time on HEMCO I/O, for both global and nested settings.

Here's 1-month global 4x5 timing:

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-00:16:30.940           990.941
  Initialization                :  00-00:00:12.158            12.159
  Timesteps                     :  00-00:16:18.717           978.717
  HEMCO                         :  00-00:10:16.362           616.363
  All chemistry                 :  00-00:00:09.649             9.649
  => Gas-phase chem             :  00-00:00:08.747             8.747
  => FAST-JX photolysis         :  >>>>> THE TIMER DID NOT RUN <<<<<
  => All aerosol chem           :  00-00:00:00.004             0.004
  => Strat chem                 :  >>>>> THE TIMER DID NOT RUN <<<<<
  => Unit conversions           :  00-00:00:06.406             6.407
  Transport                     :  00-00:03:07.912           187.912
  Convection                    :  00-00:00:05.086             5.086
  Boundary layer mixing         :  00-00:01:51.981           111.981
  Dry deposition                :  >>>>> THE TIMER DID NOT RUN <<<<<
  Wet deposition                :  >>>>> THE TIMER DID NOT RUN <<<<<
  All diagnostics               :  00-00:00:05.363             5.363
  => HEMCO diagnostics          :  00-00:00:00.014             0.015
  => Binary punch diagnostics   :  00-00:00:00.003             0.004
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:00:05.379             5.380
  Input                         :  00-00:07:26.930           446.930
  Output                        :  00-00:00:05.371             5.372
  Finalization                  :  00-00:00:00.063             0.064

Here's 1-month nested NA 0.25x0.3125 timing:

  Timer name                       DD-hh:mm:ss.SSS     Total Seconds
-------------------------------------------------------------------------------
  GEOS-Chem                     :  00-06:13:29.189         22409.189
  Initialization                :  00-00:00:24.189            24.190
  Timesteps                     :  00-06:13:04.380         22384.380
  HEMCO                         :  00-03:09:20.634         11360.634
  All chemistry                 :  00-00:04:44.697           284.698
  => Gas-phase chem             :  00-00:04:11.651           251.651
  => FAST-JX photolysis         :  >>>>> THE TIMER DID NOT RUN <<<<<
  => All aerosol chem           :  00-00:00:00.002             0.003
  => Strat chem                 :  >>>>> THE TIMER DID NOT RUN <<<<<
  => Unit conversions           :  00-00:03:24.439           204.439
  Transport                     :  00-01:32:07.885          5527.886
  Convection                    :  00-00:02:30.030           150.031
  Boundary layer mixing         :  00-00:58:20.541          3500.542
  Dry deposition                :  >>>>> THE TIMER DID NOT RUN <<<<<
  Wet deposition                :  >>>>> THE TIMER DID NOT RUN <<<<<
  All diagnostics               :  00-00:01:14.885            74.885
  => HEMCO diagnostics          :  00-00:00:00.022             0.023
  => Binary punch diagnostics   :  00-00:00:00.010             0.011
  => ObsPack diagnostics        :  >>>>> THE TIMER DID NOT RUN <<<<<
  => History (netCDF diags)     :  00-00:01:15.064            75.064
  Input                         :  00-00:55:22.954          3322.954
  Output                        :  00-00:01:14.901            74.902
  Finalization                  :  00-00:00:00.597             0.598

The transport & PBL mixing calculation is so fast, so most time is just waiting for the slow I/O. From the List of GEOS-FP met fields, it seems to me that most of metfields are not needed by CH4-only simulation. Just need to keep the met variables associated with "CH4 simulation", "Advection" and "PBL mixing".

Is it possible to skip the reading of unused met variables via FlexGrid? How much speed-up can we expect? If the real bottleneck is opening & closing files, I can also merge the actually-used variables into a single file.

JiaweiZhuang commented 4 years ago

Another potential bottleneck in CH4 simulations is that, some OpenMP loops parallelize over tracers, for example in advection solver:

https://github.com/geoschem/geos-chem/blob/55f61408dd9922f23b1566058de7b0c9627caf53/GeosCore/tpcore_fvdas_mod.F90#L833-L841

but there is only one tracer in CH4 simulation, so this loop is not parallelizable.

If the use case is to run ensembles of CH4 simulations, it would be useful to define tens or hundreds of "tagged CH4" tracers, each behave independently but can share the same I/O time and allow more parallelization. From this CH4 species table, maybe such multi-tracer capability is already available?

yantosca commented 4 years ago

We used to have a tagged-CH4 simulation capability. In that case you would be looping over multiple CH4 tracers. I am not sure if this tagCH4 simulation is still supported (or if it has been broken by recent updates). Worst to worst, we could remove the OMP commands there.

msulprizio commented 4 years ago

We used to have a tagged-CH4 simulation capability. In that case you would be looping over multiple CH4 tracers. I am not sure if this tagCH4 simulation is still supported (or if it has been broken by recent updates). Worst to worst, we could remove the OMP commands there.

The tagCH4 capability is still working in GEOS-Chem 12. We include a 4x5_tagCH4 run directory in the unit tester.

msulprizio commented 4 years ago

In https://github.com/geoschem/geos-chem/issues/279, Lee Murray wrote:

There are quite a few met fields that are never used by GEOS-Chem, e.g., the SEAICE{01..90} fields are only used by the Hg simulation, and some are just read in to be re-written as diagnostic output. It would be more generic if GEOS-Chem had the feature to ignore any met field specified in HEMCO_Config.rc with /dev/null, as I believe GCHP allows? Since the speciality simulations that do not use lightning NOx don't call the lightning NOx extension, it doesn't matter what is in the lightning flash State_Met field. This would facilitate coupling with met fields from other external GCMs/CCMs as well.

lizziel commented 3 years ago

I am closing this issue. See new feature request https://github.com/geoschem/geos-chem/issues/499 for a more generic request for memory reduction and model speed increase via restriction of met-field read and State_Met allocation for both GEOS-Chem Classic and GCHP.

geoschem / geos-chem

[FEATURE REQUEST] Only read necessary met fields to speed-up simulation s #91