EO-DataHub / eodh-eocis-sprint

Outputs from the EODH-EOCIS Sprints
BSD 2-Clause "Simplified" License
0 stars 1 forks source link

Modifications to the EODH STAC indexing #3

Open agstephens opened 1 year ago

agstephens commented 1 year ago

Modifications to the EODH STAC indexing

1. Which CMIP6/CORDEX datasets should we include in the EODH?

We have chosen a subset of datasets to index in the EODH STAC catalogue. The following are required:

The CMIP6 datasets:

https://github.com/cedadev/eodh_model_data/blob/main/Products/cmip6_ds_list_esgfid.txt

The CORDEX datasets:

https://github.com/cedadev/eodh_model_data/blob/main/Products/cordex_ds_list_esgfid.txt

2. Where should the "kerchunk" info live in the STAC item?

Please see this example for how to put the Kerchunk file into the "assets" section:

https://github.com/EO-DataHub/eodh-eocis-sprint/blob/main/stac-sandbox/outputs/CMIP6.CMIP.MOHC.HadGEM3-GC31-MM.historical.r1i1p1f3.6hrPlev.tas.gn.v20200923.json#L51-L62

Note that the dataset-to-stac.py has some logic to decide how to generate this.

3. Global attributes to STAC Item properties

In your example, you include some attributes that I have excluded, such as:

In my example, I have reduced them to only those listed in this example:

https://github.com/EO-DataHub/eodh-eocis-sprint/blob/main/stac-sandbox/outputs/CMIP6.CMIP.MOHC.HadGEM3-GC31-MM.historical.r1i1p1f3.6hrPlev.tas.gn.v20200923.json#L3-L19

As prescribed by this config:

https://github.com/EO-DataHub/eodh-eocis-sprint/blob/main/stac-sandbox/dset_stac_configs.py#L72-L80

It would be good to stick with those we are prescribing. We might change them later but having control over them for now is useful.

4. Attributes inside an Asset

Rename:

Remove:

rhysrevans3 commented 1 year ago

@agstephens for the CMIP6 datasets some don't appear to be in the archive. For example: CMIP6.CMIP.CAMS.CAMS-CSM1-0.ssp119.r1i1p1f1.Amon.clt.gn.v20190708 https://data.ceda.ac.uk/badc/cmip6/data/CMIP6/CMIP/CAMS/CAMS-CSM1-0/ssp119/r1i1p1f1/Amon/vas/gn/v20190708 gives This path was not found

agstephens commented 1 year ago

Hi @rhysrevans3: please do a listing of all the datasets to check which are missing on the file system. We can then discuss with Martin whether we just remove them or we need to re-get them.

rhysrevans3 commented 1 year ago

@agstephens here is the list of the 4263 missing datasets: missing_cmip6_datasets.txt

agstephens commented 1 year ago

@rhysrevans3 I've messaged Martin to check details.

agstephens commented 1 year ago

@rhysrevans3: Martin has updated the list, and presented them as directory paths. Please use this file:

https://github.com/cedadev/eodh_model_data/blob/main/Products/cmip6_ds_list_paths.txt

They should all be in the CEDA archive.