NCAR / intake-esm-datastore

Intake-esm Datastore
Apache License 2.0
14 stars 11 forks source link

Add CESM2-CMIP6 (not CMORized) collection #37

Closed mnlevy1981 closed 5 years ago

mnlevy1981 commented 5 years ago

With help from @andersy005 I turned @matt-long's netcdf-based CESM2-CMIP6 metadata store into a gzipped CSV file. It's on glade at /glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.csv.gz.

Python code to generate this:

ds = xr.open_dataset('/glade/u/home/mclong/.intake_esm/collections/CESM2-CMIP6.nc')
df = ds.to_dataframe()
df = df.drop(columns=['resource', 'resource_type', 'direct_access', 'file_basename', 'ctrl_branch_year', 'sequence_order', 'grid'])
df = df.rename(columns = {'file_fullpath' : 'path'})
df.to_csv('/glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.csv.gz', compression='gzip', index=False)
mnlevy1981 commented 5 years ago

Corresponding json file is /glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.json.

@matt-long: purpose of this is to address https://github.com/marbl-ecosys/cesm2-marbl/issues/2, specifically looking at forcing_iron_flux.ipynb. To that end, can you point me to the CESM1-CMIP5.nc file you had used? I can then create a .csv.gz version of it using the same process.

mnlevy1981 commented 5 years ago

@andersy005 -- I think we did something wrong when we set everything up... if you look in this notebook, the commands

cesm2 = intake.open_esm_datastore('/glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.json')
keep_vars = ['TAREA', 'TLONG', 'TLAT', 'IRON_FLUX', 'time', 'time_bound', 'member_id']
dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_dataset_dict(cdf_kwargs={'chunks':{'time': 48}, 'decode_times' : False})
_, ds2 = dq.popitem()
ds2 = ds2.drop([v for v in ds2.variables if v not in keep_vars])

Leds to an improper time dimension... the time dimension is length 3960 (3960 months = 330 years), while the data spans half that (Jan 1850 through Dec 2014 = 165 years = 1980 months). I didn't include this detail in my notebook, time dimension does the following:

  1. The first 600 values of ds2['time'] covers the 50 years from Jan 1850 - Dec 1899
  2. The next 1980 values of ds2['time'] cover the full 165 years of interest (Jan 1850 - Dec 2014)
  3. The final 1320 values of ds2['time'] cover the final 115 years again (Jan 1900 - Dec 2014)

In other words:

  1. ds2['time'].values[0:600] == ds2['time'].values[600:1200] (Jan 31, 1850 through Dec 31, 1899)
  2. ds2['time'].values[1200:2580] == ds2['time'].values[2580:3960] (Jan 1, 1900 through Dec 31, 2014)

Note that ds2['time'].values[600:2580] is the desired time range (though I doubt all ensemble members have values defined here; some are probably using ds2['time'].values[0:600] and / or ds2['time'].values[2580:3960]). Regardless, this causes esmlab to throw

ValueError: index must be monotonic for resampling

I guess I'm not sure how to proceed. Is this something related to the "aggregation_control" key in /glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.json and the data isn't being grouped correctly? Is it an intake-esm issue? Something else?

andersy005 commented 5 years ago

@matt-long and I noticed a similar issue during the CMIP6 hackathon (For some queries, the time axis length was getting doubled). As far I can remember, we noticed that this was happening for cases in which the data for some member_ids was in a single file, and for the other member_ids, the data was split into multiple files.

My understand is that this is an issue at the intake-esm level. As part of a solution to this issue, we probably need to do an alignment prior the concatenation step. Matt and I were hoping to fix this issue last week, and unfortunately we didn't have time to do so.

I will curve out some time to look into it this week, and hopefully we can fix this issue next week.

mnlevy1981 commented 5 years ago

Thanks! I looked at the intake-esm issues and https://github.com/NCAR/intake-esm/issues/160 definitely sounds similar to what I was seeing, but the reported error message was different from mine (xarray was happy to open the dataset for me) so I didn't know if it was related. Looking closer, I'm now seeing that their error came from ds.sel(), not from actually opening the dataset... so I bet I could recreate that error in my notebook :)

andersy005 commented 5 years ago

The following query should return a dataset with time=1980,

col = intake.open_esm_datastore("/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json")

cat = col.search(experiment_id='historical',
                 activity_id='CMIP',
                 table_id='Omon',
                 variable_id='spco2',
                 grid_label='gn',
                 source_id='CESM2')

dsets = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}})

<xarray.Dataset>
Dimensions:    (d2: 2, member_id: 11, nlat: 384, nlon: 320, time: 1980, vertices: 4)
Coordinates:
  * time       (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
  * nlon       (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
  * nlat       (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
  * member_id  (member_id) <U9 'r10i1p1f1' 'r11i1p1f1' ... 'r8i1p1f1' 'r9i1p1f1'
Dimensions without coordinates: d2, vertices
Data variables:
    lat_bnds   (nlat, nlon, vertices) float32 -79.48714 -79.48714 ... 72.41355
    lat        (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
    lon        (nlat, nlon) float64 320.6 321.7 322.8 ... 318.9 319.4 319.8
    lon_bnds   (nlat, nlon, vertices) float32 320.0 321.125 ... 320.0 319.586
    time_bnds  (time, d2) object dask.array<chunksize=(600, 2), meta=np.ndarray>
    spco2      (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 600, 384, 320), meta=np.ndarray>

However, it returns a dataset with time=3960

{'CMIP.NCAR.CESM2.historical.Omon.gn': <xarray.Dataset>
 Dimensions:    (d2: 2, member_id: 11, nlat: 384, nlon: 320, time: 3960, vertices: 4)
 Coordinates:
   * nlon       (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
   * member_id  (member_id) object 'r10i1p1f1' 'r11i1p1f1' ... 'r9i1p1f1'
   * nlat       (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
   * time       (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
 Dimensions without coordinates: d2, vertices
 Data variables:
     lat_bnds   (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
     lon_bnds   (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
     lat        (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
     lon        (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
     time_bnds  (time, d2) object dask.array<chunksize=(600, 2), meta=np.ndarray>
     spco2      (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 600, 384, 320), meta=np.ndarray>
}

When I specify the member_ids with data in a single file:

# Find member_ids with data in a single file
cat.df.set_index(['member_id', 'time_range'])['path']

member_id  time_range   
r2i1p1f1   185001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r5i1p1f1   185001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r1i1p1f1   185001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r4i1p1f1   185001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r3i1p1f1   185001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r9i1p1f1   200001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           190001-194912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           185001-189912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           195001-199912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r8i1p1f1   185001-189912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           190001-194912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           200001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           195001-199912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r10i1p1f1  195001-199912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           200001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           190001-194912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           185001-189912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r7i1p1f1   185001-189912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           190001-194912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           195001-199912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           200001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r11i1p1f1  195001-199912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           190001-194912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           185001-189912    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
           200001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r6i1p1f1   185001-201412    /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
cat = col.search(experiment_id='historical',
                 activity_id='CMIP',
                 table_id='Omon',
                 variable_id='spco2',
                 grid_label='gn',
                 source_id='CESM2',
                member_id = ["r2i1p1f1", "r5i1p1f1", "r1i1p1f1", "r4i1p1f1", "r4i1p1f1", "r6i1p1f1"])

dsets = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}})

The dataset's time axis appears to be correct

{'CMIP.NCAR.CESM2.historical.Omon.gn': <xarray.Dataset>
 Dimensions:    (d2: 2, member_id: 5, nlat: 384, nlon: 320, time: 1980, vertices: 4)
 Coordinates:
   * nlon       (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
   * member_id  (member_id) <U8 'r2i1p1f1' 'r5i1p1f1' ... 'r4i1p1f1' 'r6i1p1f1'
   * nlat       (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
   * time       (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
 Dimensions without coordinates: d2, vertices
 Data variables:
     lat_bnds   (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
     lon_bnds   (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
     lat        (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
     lon        (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
     time_bnds  (time, d2) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
     spco2      (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 1980, 384, 320), meta=np.ndarray>
 }
andersy005 commented 5 years ago
cat = col.search(experiment_id='historical',
                 activity_id='CMIP',
                 table_id='Omon',
                 variable_id='spco2',
                 grid_label='gn',
                 source_id='CESM2',
                member_id = ["r9i1p1f1", "r11i1p1f1", "r7i1p1f1", "r10i1p1f1", "r8i1p1f1"])

_, ds = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}}).popitem()
ds.time

<xarray.DataArray 'time' (time: 1980)>
array([cftime.DatetimeNoLeap(1850, 1, 15, 13, 0, 0, 0, 2, 15),
       cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0, 4, 45),
       cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0, 5, 74), ...,
       cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, 5, 288),
       cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, 1, 319),
       cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, 3, 349)], dtype=object)
Coordinates:
  * time     (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Attributes:
    axis:           T
    bounds:         time_bnds
    standard_name:  time
    title:          time
    type:           double
cat = col.search(experiment_id='historical',
                 activity_id='CMIP',
                 table_id='Omon',
                 variable_id='spco2',
                 grid_label='gn',
                 source_id='CESM2',
                member_id = ["r2i1p1f1"])

_, ds2 = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}}).popitem()
ds2.time

<xarray.DataArray 'time' (time: 1980)>
array([cftime.DatetimeNoLeap(1850, 1, 15, 13, 0, 0, 0, 2, 15),
       cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0, 4, 45),
       cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0, 5, 74), ...,
       cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, 5, 288),
       cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, 1, 319),
       cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, 3, 349)], dtype=object)
Coordinates:
  * time     (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Attributes:
    axis:           T
    bounds:         time_bnds
    standard_name:  time
    title:          time
    type:           double
xr.concat([ds, ds2], dim="time")

yields

<xarray.DataArray 'time' (time: 3960)>
array([cftime.DatetimeNoLeap(1850, 1, 15, 13, 0, 0, 0, 2, 15),
       cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0, 4, 45),
       cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0, 5, 74), ...,
       cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, 5, 288),
       cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, 1, 319),
       cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, 3, 349)], dtype=object)
Coordinates:
  * time     (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Attributes:
    axis:           T
    bounds:         time_bnds
    standard_name:  time
    title:          time
    type:           double

For reason unbeknownst to me so far, xarray says that the two time axes are identical

import xarray 
xr.testing.assert_identical(ds.time, ds2.time)
mnlevy1981 commented 5 years ago

@andersy005 Thanks for these examples! I updated my notebook to only use the 6 members that have all their data in a single netcdf file and the notebook runs further than before. I think I'm now hitting the issue of xarray not playing nicely with cftime==1.0.4, but I'll keep this issue open until intake-esm is updated and I can have all 11 members in a dataset with a time dimension of length 1980.

mnlevy1981 commented 5 years ago

Current issue with notebook: opening the dataset with decode_times=True leads to following from esmlab.resample()

Please open dataset with `decode_times=False`

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-9-98967eb7d640> in <module>
----> 1 ds2_ann = esmlab.resample(ds2, freq='ann')
      2 ds2_ann

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in resample(dset, freq, weights, time_coord_name)
    760 
    761     else:
--> 762         ds = dset.esmlab.set_time(time_coord_name=time_coord_name).compute_ann_mean(weights=weights)
    763 
    764     new_history = f'\n{datetime.now()} esmlab.resample(<DATASET>, freq="{freq}")'

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in set_time(self, time_coord_name, year_offset)
    345                 except Exception as exc:
    346                     print('Please open dataset with `decode_times=False`')
--> 347                     raise exc
    348         self.setup()
    349         return self

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in set_time(self, time_coord_name, year_offset)
    340                         self._ds[self.tb_name],
    341                         units=self.time_attrs['units'],
--> 342                         calendar=self.time_attrs['calendar'],
    343                     )
    344                     self.time_bound.data = tb_data

cftime/_cftime.pyx in cftime._cftime.date2num()

cftime/_cftime.pyx in cftime._cftime._dateparse()

cftime/_cftime.pyx in cftime._cftime._datesplit()

AttributeError: 'NoneType' object has no attribute 'split'

Following the advice and setting decode_times=False leads to

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-98967eb7d640> in <module>
----> 1 ds2_ann = esmlab.resample(ds2, freq='ann')
      2 ds2_ann

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in resample(dset, freq, weights, time_coord_name)
    760 
    761     else:
--> 762         ds = dset.esmlab.set_time(time_coord_name=time_coord_name).compute_ann_mean(weights=weights)
    763 
    764     new_history = f'\n{datetime.now()} esmlab.resample(<DATASET>, freq="{freq}")'

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/contextlib.py in inner(*args, **kwds)
     72         def inner(*args, **kwds):
     73             with self._recreate_cm():
---> 74                 return func(*args, **kwds)
     75         return inner
     76 

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in compute_ann_mean(self, weights)
    503             return da_weighted_mean.where(mask)
    504 
--> 505         ds_resample_mean = dset.apply(weighted_mean_arr, wgts=wgts)
    506 
    507         if self.time_bound is not None:

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/dataset.py in apply(self, func, keep_attrs, args, **kwargs)
   4138         variables = {
   4139             k: maybe_wrap_array(v, func(v, *args, **kwargs))
-> 4140             for k, v in self.data_vars.items()
   4141         }
   4142         if keep_attrs is None:

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/dataset.py in <dictcomp>(.0)
   4138         variables = {
   4139             k: maybe_wrap_array(v, func(v, *args, **kwargs))
-> 4140             for k, v in self.data_vars.items()
   4141         }
   4142         if keep_attrs is None:

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in weighted_mean_arr(darr, wgts)
    491             ones = xr.where(cond, 0.0, 1.0)
    492             mask = (
--> 493                 darr.resample({self.time_coord_name: 'A'}).mean(dim=self.time_coord_name).notnull()
    494             )
    495             da_sum = (

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/common.py in resample(self, indexer, skipna, closed, label, base, keep_attrs, loffset, restore_coord_dims, **indexer_kwargs)
   1036             grouper=grouper,
   1037             resample_dim=RESAMPLE_DIM,
-> 1038             restore_coord_dims=restore_coord_dims,
   1039         )
   1040 

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/resample.py in __init__(self, dim, resample_dim, *args, **kwargs)
    172         self._resample_dim = resample_dim
    173 
--> 174         super().__init__(*args, **kwargs)
    175 
    176     def apply(self, func, shortcut=False, args=(), **kwargs):

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, restore_coord_dims, cut_kwargs)
    334                 # TODO: sort instead of raising an error
    335                 raise ValueError("index must be monotonic for resampling")
--> 336             full_index, first_items = self._get_index_and_items(index, grouper)
    337             sbins = first_items.values.astype(np.int64)
    338             group_indices = [slice(i, j) for i, j in zip(sbins[:-1], sbins[1:])] + [

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/groupby.py in _get_index_and_items(self, index, grouper)
    432             first_items = grouper.first_items(index)
    433         else:
--> 434             first_items = s.groupby(grouper).first()
    435             _apply_loffset(grouper, first_items)
    436         full_index = first_items.index

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, **kwargs)
   7892             squeeze=squeeze,
   7893             observed=observed,
-> 7894             **kwargs
   7895         )
   7896 

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in groupby(obj, by, **kwds)
   2520         raise TypeError("invalid type: {}".format(obj))
   2521 
-> 2522     return klass(obj, by, **kwds)

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, **kwargs)
    389                 sort=sort,
    390                 observed=observed,
--> 391                 mutated=self.mutated,
    392             )
    393 

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/groupby/grouper.py in _get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
    511     # a passed-in Grouper, directly convert
    512     if isinstance(key, Grouper):
--> 513         binner, grouper, obj = key._get_grouper(obj, validate=False)
    514         if key.key is None:
    515             return grouper, [], obj

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/resample.py in _get_grouper(self, obj, validate)
   1446     def _get_grouper(self, obj, validate=True):
   1447         # create the resampler and return our binner
-> 1448         r = self._get_resampler(obj)
   1449         r._set_binner()
   1450         return r.binner, r.grouper, r.obj

/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/resample.py in _get_resampler(self, obj, kind)
   1441             "Only valid with DatetimeIndex, "
   1442             "TimedeltaIndex or PeriodIndex, "
-> 1443             "but got an instance of %r" % type(ax).__name__
   1444         )
   1445 

TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'CFTimeIndex'

Note that Keith had a similar problem with https://github.com/NCAR/esmlab/issues/151 but running with decode_times=True and cftime==1.0.3.4 fixed it for him and in my case it seems like I need to be running with decode_times=False for some reason... I'm not entirely clear where the

Please open dataset with `decode_times=False`

comes from.

mnlevy1981 commented 5 years ago

It looks like https://github.com/NCAR/intake-esm/pull/171 fixes the issue I was having with the time dimension, thanks @andersy005 !

mnlevy1981 commented 5 years ago

@andersy005 -- I created a directory /glade/work/mlevy/intake-esm-collection that contains two subdirectories of interest:

  1. csv.gz contains csv-formatted (and gz-compressed) data-frame output
  2. json contains a JSON file pointing to the corresponding csv.gz data-frame

There are currently two collections: CESM1-CMIP5_only-NOT_CMORIZED and CESM2-CMIP6_only-NOT_CMORIZED.

What's the process for updating this repository to include them? I was thinking I would fork & clone this repo, create a branch, add my files, and then update the json to point to /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/ instead of /glade/work/mlevy/intake-esm-collection/csv.gz/ but if there is a different procedure already in place I'm happy to follow the rules.

andersy005 commented 5 years ago

What's the process for updating this repository to include them? I was thinking I would fork & clone this repo, create a branch, add my files, and then update the json to point to /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/ instead of /glade/work/mlevy/intake-esm-collection/csv.gz/

This repo is mirror-red in a centralized location on glade. The catalogs and their corresponding JSON files reside in/glade/collections/cmip/catalog/intake-esm-datastore/catalogs. Oonce #40 is merged, you should be able to access both the json and the corresponding csv from this location.