NOAA-GFDL / MDTF-diagnostics

Analysis framework and collection of process-oriented diagnostics for weather and climate simulations
https://mdtf-diagnostics.readthedocs.io/en/main/
Other
56 stars 95 forks source link

Catalog consistency MDTF and user data catalog #588

Open aradhakrishnanGFDL opened 1 month ago

aradhakrishnanGFDL commented 1 month ago

What problem will this feature solve?

Achieves some level of consistency in the input data catalog (from GFDL catalog builder) and the MDTF intermediate catalogs in PP.

Important so users that are new to catalogs can learn one set of terms and specs/template for the data catalog, as they get started.

Helps both GFDL analysis scripts with and without MDTF to use a common catalog and hence improve interoperability.

Helps with training material, shared across GFDL and CESM, and for model inter-comparison projects.

Describe the solution you'd like To the aggregate_columns: https://github.com/NOAA-GFDL/MDTF-diagnostics/blob/c87746c7e19870806b025c79c90f96cc33c1d173/src/util/catalog.py#L205:L216

Add: chunk_freq , Change: variant_label to member_id (MDTF) Consider: For recording the “convention”, evaluate reusing the CMIP CV. “project_id” as the column name. Example: project_id = CMIP, project_id = dev , project_id = GFDL.

If activity_id is not being used, can it be removed or moved outside of aggregate columns? It was originally used to filter by “MIP” in CMIP6. It could be an “optional” column, rather than in aggregate_columns.

Ordering of the aggregate columns can also be maintained, so that a user that typically uses a "key pattern" to query a dataset is less confused.

Here is how the GFDL catalog builder template looks like (to be merged in, more changes pending): https://github.com/aradhakrishnanGFDL/CatalogBuilder/blob/129-cmip/cats/gfdl_template.json#L79:L88

(note that modeling_realm will be changed to realm in the above; temporal_subset will be changed to time_range)

Describe alternatives you've considered

Alternate way of handling things considered and following actions to be taken from the GFDL Catalog builder side to help synchronize the data catalog template with MDTF. (Following is NOT for MDTF framework suggested changes)

Change: modeling_realm to realm (GFDL Catalog Builder) Change: temporal_subset to time_range (GFDL Catalog Builder)

If there are changes that do not resonate with the framework goals or catalog usage, please raise them to discuss further and rethink solutions.