Open agstephens opened 1 year ago
@agstephens for the CMIP6 datasets some don't appear to be in the archive. For example:
CMIP6.CMIP.CAMS.CAMS-CSM1-0.ssp119.r1i1p1f1.Amon.clt.gn.v20190708
https://data.ceda.ac.uk/badc/cmip6/data/CMIP6/CMIP/CAMS/CAMS-CSM1-0/ssp119/r1i1p1f1/Amon/vas/gn/v20190708 gives This path was not found
Hi @rhysrevans3: please do a listing of all the datasets to check which are missing on the file system. We can then discuss with Martin whether we just remove them or we need to re-get them.
@agstephens here is the list of the 4263 missing datasets: missing_cmip6_datasets.txt
@rhysrevans3 I've messaged Martin to check details.
@rhysrevans3: Martin has updated the list, and presented them as directory paths. Please use this file:
https://github.com/cedadev/eodh_model_data/blob/main/Products/cmip6_ds_list_paths.txt
They should all be in the CEDA archive.
Modifications to the EODH STAC indexing
1. Which CMIP6/CORDEX datasets should we include in the EODH?
We have chosen a subset of datasets to index in the EODH STAC catalogue. The following are required:
The CMIP6 datasets:
https://github.com/cedadev/eodh_model_data/blob/main/Products/cmip6_ds_list_esgfid.txt
The CORDEX datasets:
https://github.com/cedadev/eodh_model_data/blob/main/Products/cordex_ds_list_esgfid.txt
2. Where should the "kerchunk" info live in the STAC item?
Please see this example for how to put the Kerchunk file into the "assets" section:
https://github.com/EO-DataHub/eodh-eocis-sprint/blob/main/stac-sandbox/outputs/CMIP6.CMIP.MOHC.HadGEM3-GC31-MM.historical.r1i1p1f3.6hrPlev.tas.gn.v20200923.json#L51-L62
Note that the
dataset-to-stac.py
has some logic to decide how to generate this.3. Global attributes to STAC Item properties
In your example, you include some attributes that I have excluded, such as:
In my example, I have reduced them to only those listed in this example:
https://github.com/EO-DataHub/eodh-eocis-sprint/blob/main/stac-sandbox/outputs/CMIP6.CMIP.MOHC.HadGEM3-GC31-MM.historical.r1i1p1f3.6hrPlev.tas.gn.v20200923.json#L3-L19
As prescribed by this config:
https://github.com/EO-DataHub/eodh-eocis-sprint/blob/main/stac-sandbox/dset_stac_configs.py#L72-L80
It would be good to stick with those we are prescribing. We might change them later but having control over them for now is useful.
4. Attributes inside an Asset
Rename:
bbox
toarea
Remove: