NCAR / intake-esm-datastore

Intake-esm Datastore
Apache License 2.0
14 stars 11 forks source link

Parsing attributes for CESM2-CMIP6 collection assets #50

Open andersy005 opened 4 years ago

andersy005 commented 4 years ago

I've put together a file parser for cesm2_cmip6 collection, however, I am not sure I am getting everything right especially the experiment attribute for the Decadal Prediction (DCPP) output.

For instance, here's what I get for one file:

In [1]: from cesm import cesm2_cmip6_parser                                                                                                                                                                                           

In [2]: f = "/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc"                                                                                                                                                             

In [3]: cesm2_cmip6_parser(f)                                                                                                                                                                                                         
Out[3]: 
{'path': '/glade/collections/cdg/timeseries-cmip6/DCPP/011-020/b.e11.BDP.f09_g16.1969-11.014/atm/proc/tseries/month_1/b.e11.BDP.f09_g16.1969-11.014.cam.h0.SOLIN.196911-197912.nc',
 'case': 'b.e11.BDP.f09_g16.1969-11.014',
 'variable': 'SOLIN',
 'date_range': '196911-197912',
 'stream': 'cam.h0',
 'component': 'atm',
 'experiment': '1969-11'}

Note that I am getting experiment=1969-11. Is this right or should we treat DCPP outputs as a special case?

I seem to be getting the right attributes for outputs from other experiments:

In [4]: f2 = '/glade/collections/cdg/timeseries-cmip6/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001/atm/proc/tseries/month_1/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001.cam.h0.CLD_CAL_UN.187001-191912.nc'

In [5]: cesm2_cmip6_parser(f2)                                                                                                                                                                                                        
Out[5]: 
{'path': '/glade/collections/cdg/timeseries-cmip6/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001/atm/proc/tseries/month_1/f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001.cam.h0.CLD_CAL_UN.187001-191912.nc',
 'case': 'f.e21.F1850_BGC.f09_f09_mg17.CFMIP-amip-piForcing.001',
 'variable': 'CLD_CAL_UN',
 'date_range': '187001-191912',
 'stream': 'cam.h0',
 'component': 'atm',
 'experiment': 'CFMIP-amip-piForcing'}

Originally posted by @andersy005 in https://github.com/NCAR/intake-esm-datastore/pull/47

sherimickelson commented 4 years ago

For CMIP6, only DCPP experiments contain sub experiments (see https://github.com/WCRP-CMIP/CMIP6_CVs/blob/master/CMIP6_experiment_id.json - under the "sub_experiment_id" key). As long as conventions are observed, it may be safe to assume that if DCPP resides in the path, you'll need to treat it as a special case.