ACCESS-NRI / access-nri-intake-catalog

Tools and configuration info used to manage ACCESS-NRI's intake catalogue
https://access-nri-intake-catalog.rtfd.io
Apache License 2.0
8 stars 1 forks source link

Restart Catalogue #272

Open charles-turner-1 opened 3 days ago

charles-turner-1 commented 3 days ago

Is your feature request related to a problem? Please describe.

@aidanheerdegen noted yesterday it might be helpful to have a separate catalogue of restarts, so users would be able to easily access restarts for model runs, but not accidentally access it to avoid confusion.

Describe the feature you'd like

Intake allows a single catalog to describe multiple sources: ie, the access_nri and restart catalogues could be combined as

sources:
  access_nri:
    args:
      columns_with_iterables:
      - model
      - realm
      - frequency
      - variable
      mode: r
      name_column: name
      path: /g/data/xp65/public/apps/access-nri-intake-catalog/{{version}}/metacatalog.csv
      yaml_column: yaml
    description: ACCESS-NRI intake catalog
    driver: intake_dataframe_catalog.core.DfFileCatalog
    metadata:
      storage: gdata/fs38+gdata/oi10+gdata/tm70
      version: '{{version}}'
    parameters:
      version:
        default: v0.1.3
        description: Catalog version
        type: str
  restarts:
    args:
      columns_with_iterables:
      - model
      - realm
      - frequency
      - variable
      mode: r
      name_column: name
      path: /g/data/xp65/public/apps/access-nri-intake-catalog/{{version}}/restart_metacatalog.csv
      yaml_column: yaml
    description: ACCESS-NRI restart catalog
    driver: intake_dataframe_catalog.core.DfFileCatalog
    metadata:
      storage: gdata/al33+gdata/rr3+gdata/tm70
      version: '{{version}}'
    parameters:
      version:
        default: v2024-11-11
        description: Catalog version
        max: v2024-11-11
        min: v2024-11-08
        type: str

which would then be accessible through

>>> import intake
>>> intake.cat.access_nri
<access_nri catalog with 94 source(s) across 2272 rows>
>>> intake.cat.restarts
<user_def catalog with x source(s) across y rows>

Describe alternatives you've considered

This feature would build on the approach described in #245 - see there for potential pitfalls.

Additional context

aidanheerdegen commented 3 days ago

Thanks @charles-turner-1 for making this issue and pointing out the possibilities for how it might work.

I'll ping @jo-basevi and @tmcadam here as this is part of the experiment provenance and tracking work.

Would it help to have some example restarts to index for testing purposes? We're probably still lacking some important metadata (experiment and run IDs) in the files themselves, see https://github.com/payu-org/payu/issues/510, so that might be a blocker until remedied.

marc-white commented 2 days ago

For my own edification (and the future software peeps who aren't from a climate background), can someone give me a definitive explanation (or link to the same) of what the 'restarts' are, particularly with respect to how they differ from the 'outputs'?