Closed aradhakrishnanGFDL closed 2 weeks ago
Add: table_id, grid_label, version_id to the json under aggregate_columns. Exploring other ways.
Then it works.
source /net2/rlm/analysis-scripts/example/env/bin/activate
import intake, intake_esm
col = "/home/a1r/github/CatalogBuilder/scripts/catalogcmip-2.json"
cat = intake.open_esm_datastore(col)
cat2 = cat.search(variable_id="tos",table_id="Oday",grid_label="gn")
dset_dict = cat2.to_dataset_dict(cdf_kwargs={'chunks': {'time':5}, 'decode_times': False})
dset_dict.keys()
dict_keys(['GFDL-ESM4.abrupt-4xCO2.r1i1p1f1.Oday.v20180701.tos.gn', 'GFDL-ESM4.1pctCO2.r1i1p1f1.Oday.v20180701.tos.gn', 'GFDL-ESM4.historical.r1i1p1f1.Oday.v20190726.tos.gn', 'GFDL-ESM4.historical.r2i1p1f1.Oday.v20180701.tos.gn', 'GFDL-ESM4.esm-hist.r1i1p1f1.Oday.v20180701.tos.gn', 'GFDL-ESM4.historical.r3i1p1f1.Oday.v20180701.tos.gn', 'GFDL-ESM4.piControl.r1i1p1f1.Oday.v20180701.tos.gn'])
if we remove version_id from agg columns, still works.. but user needs to be mindful to search for specific version_id before the xarray dataset object can be used. No errors until you likely get to a plot where you will see there can be overlapping time periods.
col = "/home/a1r/github/CatalogBuilder/scripts/catalogcmip-3.json"
cat2 = cat.search(variable_id="dissocos",table_id="Omon",grid_label='gr')
dset_dict = cat2.to_dataset_dict(cdf_kwargs={'chunks': {'time':5}, 'decode_times': False})
--> The keys in the returned dictionary of datasets are constructed as follows:
'source_id.experiment_id.member_id.table_id.grid_label'
█████████████████████████████████████████████████████████████| 100.00% [7/7 08:59<00:00]
dset_dict.keys()
dict_keys(['GFDL-ESM4.1pctCO2.r1i1p1f1.Oday.gr', 'GFDL-ESM4.abrupt-4xCO2.r1i1p1f1.Oday.gr', 'GFDL-ESM4.esm-hist.r1i1p1f1.Oday.gr', 'GFDL-ESM4.historical.r2i1p1f1.Oday.gr', 'GFDL-ESM4.historical.r1i1p1f1.Oday.gr', 'GFDL-ESM4.historical.r3i1p1f1.Oday.gr', 'GFDL-ESM4.piControl.r1i1p1f1.Oday.gr'])
Ofcourse, Oday/tos has only one version. What-if there are two versions?
cat2.df[(cat2.df['variable_id']=='dissocos') & (cat2.df['experiment_id']=='historical')]['version_id'].nunique
<bound method IndexOpsMixin.nunique of 0 v20180701 1 v20180701 2 v20180701 3 v20180701 4 v20180701 5 v20180701 6 v20180701 7 v20180701 8 v20180701 9 v20190726 10 v20190726 11 v20190726 12 v20190726 13 v20190726 14 v20190726 15 v20190726 16 v20190726 17 v20190726
cat2.df.groupby("variable_id")[["experiment_id", "grid_label","version_id","variable_id", "table_id"]].nunique()
experiment_id grid_label version_id variable_id table_id
variable_id
dissocos 6 1 2 1 1
Instead use this
cat2.df.groupby("variable_id")[["source_id","experiment_id","frequency","member_id","grid_label","version_id","variable_id", "table_id"]].nunique()
source_id experiment_id frequency member_id grid_label version_id variable_id table_id
variable_id
dissocos
>>> cat2.df.groupby("variable_id")[["source_id","experiment_id","frequency","modeling_realm","member_id","table_id","grid_label","chunk_freq","version_id"]].nunique()
source_id experiment_id frequency modeling_realm member_id table_id grid_label chunk_freq version_id
variable_id
aragos 1 1 0 0 1 1 1 0 1
baccos 1 1 0 0 1 1 1 0 1
bfeos 1 1 0 0 1 1 1 0 1
bsios 1 1 0 0 1 1 1 0 1
calcos 1 1 0 0 1 1 1 0 1
... ... ... ... ... ... ... ... ... ...
zmicro 1 1 0 0 1 1 1 0 1
zmicroos 1 1 0 0 1 1 1 0 1
zooc 1 1 0 0 1 1 1 0 1
zoocos 1 1 0 0 1 1 1 0 1
zos 1 1 0 0 1 1 1 0 1
[118 rows x 9 columns]
examples and tests are documented here https://github.com/aradhakrishnanGFDL/canopy-cats/blob/main/notebooks/cmip_example.ipynb
sample catalogs are in the path described in the notebooks to find them locally at GFDL. There is also a copy of the catalogs in https://github.com/aradhakrishnanGFDL/canopy-cats/tree/main/catalogs/cmip-eg
This PR addresses #129.
To test: Please test on a GFDL PP directory as well to make sure nothing broke there. Then, test the CMIP data using the following example or adapt to something else
./gen_intake_gfdl.py --config config-cmip.yaml
config-cmip.yaml is now in configs/ and has a test-case for CMIP.
Expected csv and json
JSON generated at: /home/a1r/github/CatalogBuilder/scripts/catalogcmip.json CSV generated at: /home/a1r/github/CatalogBuilder/scripts/catalogcmip.csv