ACCESS-NRI / access-nri-intake-catalog

Tools and configuration info used to manage ACCESS-NRI's intake catalogue
https://access-nri-intake-catalog.rtfd.io
Apache License 2.0
8 stars 1 forks source link

ACCESS-OM3 filename patterns are too prescriptive #176

Open anton-seaice opened 3 months ago

anton-seaice commented 3 months ago

Is your feature request related to a problem? Please describe.

For ESM1.5 and OM2, cice daily output is concatenated into a monthly files, with filenames containing -YYYY-MM-daily.nc

Currently, the OM3 builder accepts filenames of this form (e.g. om3.cice.h.1900-01-daily.nc), however the resulting datastore doesn't work. There is a new file_id in the datastore for each file, when there should be one for all daily cice data (I believe?)

Describe the feature you'd like

The regexs used to determine these fileid's needs revisiting:

https://github.com/ACCESS-NRI/access-nri-intake-catalog/blob/9381296f1f2c53b5b894ed1e8162f7de70a3073a/src/access_nri_intake/source/utils.py#L195

Describe alternatives you've considered

We could use a different form of the filename for daily cice output

Additional context

This is needed to add auto post-processing to OM3 configurations (https://github.com/COSIMA/access-om3/issues/182).

To reproduce the error, see https://github.com/anton-seaice/sandbox/blob/main/open_datastore_after_cice_concat.ipynb

dougiesquire commented 3 months ago

Thanks @anton-seaice. The approach of using regexes to determine the fileid for concatenation is obscure and error-prone and needs revisiting. I don't think any of us have time for that just now. Let's discuss this at the TWG meeting and we can add a quick fix if we decide to stick with the om3.cice.h.1900-01-daily.nc naming