Closed mattldawson closed 3 years ago
What kind of data are we talking about? For data tables or configuration information (i.e., where every task has all the data), we do not want to dictate any strategy and it is easy to implement (e.g., by reading on one task and broadcasting). Normally, there is little or no benefit from parallel I/O for these small data sets. For distributed data, we want the host model to do the I/O so that data can be regridded and time interpolated when needed. To get there from here, we need a configuration format so that the host model can read the data and populate the correct fields. I have suggested HEMCO configuration files.
The optical property lookup table datasets are small, so no need for parallel IO. This is more a question of where to put the actual calls to the io library functions. (I'm taking from your comment that this should be netcdf instead of pio because of the small datasets. Is this correct?)
I think we're leaning toward the second option, where we put netcdf wrappers in a utility library (we use a small library of utility functions and classes in MusicBox called musica-core, and were thinking of adding them here.) This would keep direct links to netcdf out of the MAM code (option 1), would not have MAM be CAM-only (option 3), and a host model could presumably replace the musica-core netcdf wrappers with pio (or something else) wrappers if they really wanted to.
I think this is what we discussed in our last meeting. I created the issue just to have documentation of this decision. My understanding is that deciding on a file IO strategy is normally something that is done at the model level. But, now that schemes are meant to be portable to multiple models, I want to make sure that we choose the absolute best option for file IO in MAM. (And, even if there is no official strategy for file IO in CCPP, I really feel like at least some advice for writing portable schemes that need access to file data would be useful for the community.)
For MAM, do we agree that option 2 (IO functions provided by musica-core) is the best way forward?
Unless @gold2718 or @gill see a problem, I'm in favor of chemistry schemes using a common infrastructure for reading in file data.
Ah, seeing some light now.
I'm taking from your comment that this should be netcdf instead of pio because of the small datasets. Is this correct?
This should be a decision based on what is best for MICM. When you think about taking MICM to your target host models, what library support do you want to require. Note that NetCDF is (or will be) incorporating much of the PIO functionality as part of the NetCDF library (not sure of the details).
Whatever format you decide on, creating a wrapper library sounds like the way to go. To signal to model build systems that your CCPP routines have a dependency (such as the wrapper library), use the dependencies
metadata keyword which is in the ccpp-table-properties
section at the top of your metadata file:
[ccpp-table-properties]
name = MAM4
type = scheme
dependencies = micm_io_wrapper.F90
[ccpp-arg-table]
name = mam4_init
type = scheme
process = aerosol_chemistry
@gold2718 - ok, sounds great. I may have some more questions as I actually start writing this code, but this helps as a general approach. Thanks!
For the specific case of data that is not geographically dependent (i.e, constant as a function of location), it sounds like Steve is saying that the CCPP and CAMDEN/CESM standard will be "roll your own".
I like your second suggestion, that we use a library for all of our musica schemes. That way we will not have to repeat ourselves for every scheme.
Let's go that direction, unless Steve has any further advice.
strategy decided: put IO library wrapper modules in musica-core library, and use these wrappers in all MUSICA-related schemes. closing issue
Per Steve, there is no official CCPP strategy for file IO. It is up to the individual "schemes" to figure this out.
Options: