[DISCUSSION] Restructuring HEMCO I/O components

geoschem / HEMCO

The Harmonized Emissions Component (HEMCO), developed by the GEOS-Chem Support Team.

Other

16 stars 32 forks source link

The current set-up of HEMCO I/O methods (ESMF & non-ESMF) is essentially hard-coded into two sets of files:

Non-ESMF (GEOS-Chem Classic, HEMCO Standalone with NcdfUtil)

src/Core/hcoio_read_std_mod.F90
src/Core/hcoio_write_std_mod.F90

ESMF (GCHP, HEMCO_GridComp in ESMF)

src/Core/hcoio_read_esmf_mod.F90
src/Core/hcoio_write_esmf_mod.F90

The choice of which set to compile is fixed at compile-time due to dependencies to NcdfUtil and ESMF, respectively, with code walled off with pre-processor switch #if defined( ESMF_ ). This is OK for walling off code requiring different set of libraries, however the same approach is used to wall off code in the rest of HEMCO core to choose which set of IO to use:

In src/Core/hcoio_dataread_mod.F90

#if defined(ESMF_)
    USE HCOIO_READ_ESMF_MOD,  ONLY : HCOIO_READ_ESMF
#else
    USE HCOIO_READ_STD_MOD,   ONLY : HCOIO_READ_STD
#endif

This approach is fine for now with two IO modules, however the situation may not be binary: we might have other HCOIO_ modules (CESM, WRF, ...) in the future:

We might want to introduce a new set of switches (preferably compatible with GEOS-Chem, i.e. MODEL_, MODEL_WRF) to permit other IO modules.
Additionally for the dry-run feature (https://github.com/geoschem/geos-chem-cloud/issues/25) we also want to introduce a new run-time option which skips I/O, essentially making hcoio_*_std_mod dummy modules that print error messages.

I am opening this issue for discussion about which method we should use to wall-off external dependencies, manage IO modules (and where to put them - Core/ is getting big), and how to add options for the dry run feature. Any feedback is welcome 😃

(Just writing up some ideas)

A possible way of structuring the IO components is to separate hcoio_* modules to a separate folder. However the Makefile dependencies are quite convoluted and there might be many issues as we actually change this.

As a quick compromise that could work is to use the GEOS-Chem style switches. This could be a little confusing down the line if we have HEMCO integrated into CESM2 without GEOS-Chem, but it would be a matter of other models adopting the MODEL_ nomenclature. We would then have e.g.

#if defined(ESMF_)
    USE HCOIO_READ_ESMF_MOD,  ONLY : HCOIO_READ_ESMF
#elseif defined(MODEL_CESM2)
    USE HCOIO_READ_CESM2_MOD,   ONLY : HCOIO_READ_CESM2
#else
    USE HCOIO_READ_STD_MOD,   ONLY : HCOIO_READ_STD
#endif

And we wall off the modules with the same pre-processor variables as we see fit. Each IO module would adhere to the same set of specs, and HEMCO does not care how things are read, as long as they are.

However there is the issue of regridding. HCOIO_READ_STD does regridding with Map_A2A while HCOIO_READ_ESMF does not (also READ_STD manages the file handles while ESMF does it via ExtData). There are two main ways of how to do this:

Make all data go in the flow of HCOIO_READ_* -> HCO_INTERP ("regridding"), with INTERP calling pre-processor walled code (e.g. HCO_INTERP_STD, HCO_INTERP_ESMF), and the ESMF component would know that the data does not need regridding and just exists. This keeps data flow consistent although it would require a bunch of extra boilerplate to go along ESMF.
Unify "IO" and "Regridding" into one, i.e. "Keep things as is". Each IO module will know whether the data it receives from upstream models (NcdfMod or ESMF) is in the right grid and does (or ignores) the regridding problem.

One important thing to note is that NCAR models, at least WRF, support changing grids. If the coordinates of the grids change, emission data needs to be re-calculated. The way to handle this would be slightly different for each approach. Or we could take advantage of the new HCO_State and create new states for different grids (this is being done by WRF-GC and if I am right, GEOS), so there wouldn't be this problem.

geoschem / HEMCO

[DISCUSSION] Restructuring HEMCO I/O components #1