ACCESS-NRI / access-nri-intake-catalog

Tools and configuration info used to manage ACCESS-NRI's intake catalogue
https://access-nri-intake-catalog.rtfd.io
Apache License 2.0
8 stars 1 forks source link

[BUG] Missing metadata.yaml files for existing CMIP5 & CMIP6 catalogues #200

Closed marc-white closed 1 month ago

marc-white commented 1 month ago

Describe the bug

During investigations on #197 , it was found that the directory containing the metadata.yaml files for the existing CMIP5 and CMIP6 catalogues (cmip5_al33, cmip5_rr3, cmip6_fs38, cmip6_oi10) has gone missing. The directory referred to in the access_nri_intake_catalog config (/g/data/tm70/intake) no longer exists.

To Reproduce

See /g/data/tm70.

Additional context

The metadata can be recovered from the existing catalog via, e.g.,


import intake
cat = intake.cat.access_nri
cat["cmpi6_fs38"].metadata
marc-white commented 1 month ago

Storing existing metadata here to avoid future loss:

cmip5_al33

{'contact': 'NCI',
 'created': None,
 'description': 'Replicated CMIP5-era datasets catalogued by NCI',
 'email': 'help@nci.org.au',
 'experiment_uuid': '658c95cc-c299-450c-82a1-b2b2308f7c6e',
 'keywords': ['cmip'],
 'license': None,
 'long_description': 'All CMIP5-era replicated data contained under the project al33.  All file versions present are in the listing. Maintained By: NCI Contact: help@nci.org.au References: https://pcmdi.llnl.gov/mips/cmip5/',
 'model': ['CMIP5'],
 'name': 'cmip5_al33',
 'nominal_resolution': [None],
 'notes': 'null',
 'parent_experiment': None,
 'reference': None,
 'related_experiments': [None],
 'url': 'https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f9489_5106_5649_5038',
 'version': None,
 'catalog_dir': ''}

cmip5_rr3

{'contact': 'NCI',
 'created': None,
 'description': 'Australian CMIP5-era datasets catalogued by NCI',
 'email': 'help@nci.org.au',
 'experiment_uuid': '473d0c44-ab66-458c-b32e-1e1774175853',
 'keywords': ['cmip'],
 'license': None,
 'long_description': 'All CMIP5-era Australian published data contained under the project rr3.  All file versions present are in the listing. Maintained By: NCI Contact: help@nci.org.au References: https://pcmdi.llnl.gov/mips/cmip5/',
 'model': ['CMIP5'],
 'name': 'cmip5_rr3',
 'nominal_resolution': [None],
 'notes': 'null',
 'parent_experiment': None,
 'reference': None,
 'related_experiments': [None],
 'url': 'https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f7448_2157_9857_1076',
 'version': None,
 'catalog_dir': ''}

cmip6_fs38

{'contact': 'NCI',
 'created': None,
 'description': 'Australian CMIP6-era datasets catalogued by NCI',
 'email': 'help@nci.org.au',
 'experiment_uuid': 'dfdeb421-5c56-4d58-a0b2-04b717e5cff7',
 'keywords': ['cmip'],
 'license': None,
 'long_description': 'All CMIP6-era Australian published data contained under the project fs38.  All file versions present are in the listing. Maintained By: NCI Contact: help@nci.org.au References: https://pcmdi.llnl.gov/CMIP6/',
 'model': ['CMIP6'],
 'name': 'cmip6_fs38',
 'nominal_resolution': [None],
 'notes': 'null',
 'parent_experiment': None,
 'reference': None,
 'related_experiments': [None],
 'url': 'https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f3154_9976_7262_7595',
 'version': None,
 'catalog_dir': ''}

cmip6_oi10

{'contact': 'NCI',
 'created': None,
 'description': 'Replicated CMIP6-era datasets catalogued by NCI',
 'email': 'help@nci.org.au',
 'experiment_uuid': 'b05038ca-8c78-4ca6-a914-ae33dd9abffe',
 'keywords': ['cmip'],
 'license': None,
 'long_description': 'All CMIP6-era replicated data contained under the project oi10.  All file versions present are in the listing. Maintained By: NCI Contact: help@nci.org.au References: https://pcmdi.llnl.gov/CMIP6/',
 'model': ['CMIP6'],
 'name': 'cmip6_oi10',
 'nominal_resolution': [None],
 'notes': 'null',
 'parent_experiment': None,
 'reference': None,
 'related_experiments': [None],
 'url': 'https://geonetwork.nci.org.au/geonetwork/srv/eng/catalog.search#/metadata/f5194_5909_8003_9216',
 'version': None,
 'catalog_dir': ''}
marc-white commented 1 month ago

I've created replacement YAML files for these four experiments. I've mostly just copied what I got from the .metadata call on the existing catalog, but I've taken the opportunity to add the available realms for each experiment. @rbeucher and/or @dougiesquire , please review: they're available on gadi under /scratch/tm70/mcw120.

Once we're happy with the YAML files, I'll get them properly placed and do a PR to update the references within the code.

rbeucher commented 1 month ago

Hi @marc-white

Looks good to me

marc-white commented 1 month ago

I don't have write permission to /g/data/dk92/catalog/v2/esm/ where all of these experiments are kept. Should we store the metadata on /g/data/tm70 again, or (preferred solution) do we know someone who has write access for dk92?

rbeucher commented 1 month ago

We can't store on dk92. However, as the catalog is in xp65, shouldn't we store those yam l files there? Tm70 is internal to access-nri.

marc-white commented 1 month ago

I presumed we wanted to store the metadata.yaml close to the data that it describes, e.g., see the metadata.yaml locations for ACCESS-CM2 in access-nri-intake-catalog/config/access-cm2.yaml.

rbeucher commented 1 month ago

Yes but for NCI data collections we don't have write access and cannot add anything there.

marc-white commented 1 month ago

Create a new directory under xp65 for them then? E.g., /g/data/xp65/admin/metadata/<experiment>/metadata.yaml?

rbeucher commented 1 month ago

I see you have created /g/data/xp65/admin/access-nri-intake-catalog Let's use that

marc-white commented 1 month ago

@rbeucher I think that's you who created that directory, and it seems to be a copy of the access-nri-intake-catalog source code...

rbeucher commented 1 month ago

:-) I don't remember doing that... :-/
I think the access-nri-intake-catalog/config is a good place.

marc-white commented 1 month ago

Do you want me to blow away the contents of that access-nri-intake-catalog directory on xp65 and go from there?

rbeucher commented 1 month ago

The build_all.sh script in access-nri-intake-catalog/bin used to get the config files from /g/data/tm70/ds0092/projects/access-nri-intake-catalog/config. We can change it to /g/data/xp65/admin/access-nri-intake-catalog/config

rbeucher commented 1 month ago

Oh I think I am mistaken... Not sure what were those CONFIGS=( cmip5.yaml cmip6.yaml access-om2.yaml access-cm2.yaml access-esm1-5.yaml )

rbeucher commented 1 month ago

OK... I get it now. Yes I think your suggestion is good. /g/data/xp65/admin/intake/metadata should work

marc-white commented 1 month ago

@rbeucher those are the YAML files that define where all the data & associated metadata for the contents of the access-nri-intake-catalog live, and they're kept in the access-nri-intake-catalog/config directory of the repository. However, they're not included as package data, so it looks like @dougiesquire had a checkout of the repo at that tm70 so the build_all.sh script could access them.

They're distinct from the per-experiment metadata.yaml files that describe what is contained within each experiment.

rbeucher commented 1 month ago

Yes sorry, I got confused.