Closed mnlevy1981 closed 5 years ago
Corresponding json
file is /glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.json
.
@matt-long: purpose of this is to address https://github.com/marbl-ecosys/cesm2-marbl/issues/2, specifically looking at forcing_iron_flux.ipynb. To that end, can you point me to the CESM1-CMIP5.nc
file you had used? I can then create a .csv.gz
version of it using the same process.
@andersy005 -- I think we did something wrong when we set everything up... if you look in this notebook, the commands
cesm2 = intake.open_esm_datastore('/glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.json')
keep_vars = ['TAREA', 'TLONG', 'TLAT', 'IRON_FLUX', 'time', 'time_bound', 'member_id']
dq = cesm2.search(experiment=['historical'], variable='IRON_FLUX').to_dataset_dict(cdf_kwargs={'chunks':{'time': 48}, 'decode_times' : False})
_, ds2 = dq.popitem()
ds2 = ds2.drop([v for v in ds2.variables if v not in keep_vars])
Leds to an improper time dimension... the time dimension is length 3960 (3960 months = 330 years), while the data spans half that (Jan 1850 through Dec 2014 = 165 years = 1980 months). I didn't include this detail in my notebook, time dimension does the following:
ds2['time']
covers the 50 years from Jan 1850 - Dec 1899ds2['time']
cover the full 165 years of interest (Jan 1850 - Dec 2014)ds2['time']
cover the final 115 years again (Jan 1900 - Dec 2014)In other words:
ds2['time'].values[0:600] == ds2['time'].values[600:1200]
(Jan 31, 1850 through Dec 31, 1899)ds2['time'].values[1200:2580] == ds2['time'].values[2580:3960]
(Jan 1, 1900 through Dec 31, 2014)Note that ds2['time'].values[600:2580]
is the desired time range (though I doubt all ensemble members have values defined here; some are probably using ds2['time'].values[0:600]
and / or ds2['time'].values[2580:3960]
). Regardless, this causes esmlab
to throw
ValueError: index must be monotonic for resampling
I guess I'm not sure how to proceed. Is this something related to the "aggregation_control"
key in /glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.json
and the data isn't being grouped correctly? Is it an intake-esm
issue? Something else?
@matt-long and I noticed a similar issue during the CMIP6 hackathon (For some queries, the time axis length was getting doubled). As far I can remember, we noticed that this was happening for cases in which the data for some member_ids was in a single file, and for the other member_ids, the data was split into multiple files.
My understand is that this is an issue at the intake-esm
level. As part of a solution to this issue, we probably need to do an alignment prior the concatenation step. Matt and I were hoping to fix this issue last week, and unfortunately we didn't have time to do so.
I will curve out some time to look into it this week, and hopefully we can fix this issue next week.
Thanks! I looked at the intake-esm
issues and https://github.com/NCAR/intake-esm/issues/160 definitely sounds similar to what I was seeing, but the reported error message was different from mine (xarray
was happy to open the dataset for me) so I didn't know if it was related. Looking closer, I'm now seeing that their error came from ds.sel()
, not from actually opening the dataset... so I bet I could recreate that error in my notebook :)
The following query should return a dataset with time=1980
,
col = intake.open_esm_datastore("/glade/collections/cmip/catalog/intake-esm-datastore/catalogs/glade-cmip6.json")
cat = col.search(experiment_id='historical',
activity_id='CMIP',
table_id='Omon',
variable_id='spco2',
grid_label='gn',
source_id='CESM2')
dsets = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}})
<xarray.Dataset>
Dimensions: (d2: 2, member_id: 11, nlat: 384, nlon: 320, time: 1980, vertices: 4)
Coordinates:
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
* nlon (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
* nlat (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
* member_id (member_id) <U9 'r10i1p1f1' 'r11i1p1f1' ... 'r8i1p1f1' 'r9i1p1f1'
Dimensions without coordinates: d2, vertices
Data variables:
lat_bnds (nlat, nlon, vertices) float32 -79.48714 -79.48714 ... 72.41355
lat (nlat, nlon) float64 -79.22 -79.22 -79.22 ... 72.2 72.19 72.19
lon (nlat, nlon) float64 320.6 321.7 322.8 ... 318.9 319.4 319.8
lon_bnds (nlat, nlon, vertices) float32 320.0 321.125 ... 320.0 319.586
time_bnds (time, d2) object dask.array<chunksize=(600, 2), meta=np.ndarray>
spco2 (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 600, 384, 320), meta=np.ndarray>
However, it returns a dataset with time=3960
{'CMIP.NCAR.CESM2.historical.Omon.gn': <xarray.Dataset>
Dimensions: (d2: 2, member_id: 11, nlat: 384, nlon: 320, time: 3960, vertices: 4)
Coordinates:
* nlon (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
* member_id (member_id) object 'r10i1p1f1' 'r11i1p1f1' ... 'r9i1p1f1'
* nlat (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: d2, vertices
Data variables:
lat_bnds (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
lon_bnds (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
lat (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
lon (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
time_bnds (time, d2) object dask.array<chunksize=(600, 2), meta=np.ndarray>
spco2 (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 600, 384, 320), meta=np.ndarray>
}
When I specify the member_ids with data in a single file:
# Find member_ids with data in a single file
cat.df.set_index(['member_id', 'time_range'])['path']
member_id time_range
r2i1p1f1 185001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r5i1p1f1 185001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r1i1p1f1 185001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r4i1p1f1 185001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r3i1p1f1 185001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r9i1p1f1 200001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
190001-194912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
185001-189912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
195001-199912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r8i1p1f1 185001-189912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
190001-194912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
200001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
195001-199912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r10i1p1f1 195001-199912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
200001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
190001-194912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
185001-189912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r7i1p1f1 185001-189912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
190001-194912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
195001-199912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
200001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r11i1p1f1 195001-199912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
190001-194912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
185001-189912 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
200001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
r6i1p1f1 185001-201412 /glade/collections/cmip/CMIP6/CMIP/NCAR/CESM2/...
cat = col.search(experiment_id='historical',
activity_id='CMIP',
table_id='Omon',
variable_id='spco2',
grid_label='gn',
source_id='CESM2',
member_id = ["r2i1p1f1", "r5i1p1f1", "r1i1p1f1", "r4i1p1f1", "r4i1p1f1", "r6i1p1f1"])
dsets = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}})
The dataset's time axis appears to be correct
{'CMIP.NCAR.CESM2.historical.Omon.gn': <xarray.Dataset>
Dimensions: (d2: 2, member_id: 5, nlat: 384, nlon: 320, time: 1980, vertices: 4)
Coordinates:
* nlon (nlon) int32 1 2 3 4 5 6 7 8 ... 313 314 315 316 317 318 319 320
* member_id (member_id) <U8 'r2i1p1f1' 'r5i1p1f1' ... 'r4i1p1f1' 'r6i1p1f1'
* nlat (nlat) int32 1 2 3 4 5 6 7 8 ... 377 378 379 380 381 382 383 384
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Dimensions without coordinates: d2, vertices
Data variables:
lat_bnds (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
lon_bnds (nlat, nlon, vertices) float32 dask.array<chunksize=(384, 320, 4), meta=np.ndarray>
lat (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
lon (nlat, nlon) float64 dask.array<chunksize=(384, 320), meta=np.ndarray>
time_bnds (time, d2) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
spco2 (member_id, time, nlat, nlon) float32 dask.array<chunksize=(1, 1980, 384, 320), meta=np.ndarray>
}
cat = col.search(experiment_id='historical',
activity_id='CMIP',
table_id='Omon',
variable_id='spco2',
grid_label='gn',
source_id='CESM2',
member_id = ["r9i1p1f1", "r11i1p1f1", "r7i1p1f1", "r10i1p1f1", "r8i1p1f1"])
_, ds = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}}).popitem()
ds.time
<xarray.DataArray 'time' (time: 1980)>
array([cftime.DatetimeNoLeap(1850, 1, 15, 13, 0, 0, 0, 2, 15),
cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0, 4, 45),
cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0, 5, 74), ...,
cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, 5, 288),
cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, 1, 319),
cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, 3, 349)], dtype=object)
Coordinates:
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Attributes:
axis: T
bounds: time_bnds
standard_name: time
title: time
type: double
cat = col.search(experiment_id='historical',
activity_id='CMIP',
table_id='Omon',
variable_id='spco2',
grid_label='gn',
source_id='CESM2',
member_id = ["r2i1p1f1"])
_, ds2 = cat.to_dataset_dict(cdf_kwargs={"chunks": {"time": -1}}).popitem()
ds2.time
<xarray.DataArray 'time' (time: 1980)>
array([cftime.DatetimeNoLeap(1850, 1, 15, 13, 0, 0, 0, 2, 15),
cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0, 4, 45),
cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0, 5, 74), ...,
cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, 5, 288),
cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, 1, 319),
cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, 3, 349)], dtype=object)
Coordinates:
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Attributes:
axis: T
bounds: time_bnds
standard_name: time
title: time
type: double
xr.concat([ds, ds2], dim="time")
yields
<xarray.DataArray 'time' (time: 3960)>
array([cftime.DatetimeNoLeap(1850, 1, 15, 13, 0, 0, 0, 2, 15),
cftime.DatetimeNoLeap(1850, 2, 14, 0, 0, 0, 0, 4, 45),
cftime.DatetimeNoLeap(1850, 3, 15, 12, 0, 0, 0, 5, 74), ...,
cftime.DatetimeNoLeap(2014, 10, 15, 12, 0, 0, 0, 5, 288),
cftime.DatetimeNoLeap(2014, 11, 15, 0, 0, 0, 0, 1, 319),
cftime.DatetimeNoLeap(2014, 12, 15, 12, 0, 0, 0, 3, 349)], dtype=object)
Coordinates:
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
Attributes:
axis: T
bounds: time_bnds
standard_name: time
title: time
type: double
For reason unbeknownst to me so far, xarray says that the two time axes are identical
import xarray
xr.testing.assert_identical(ds.time, ds2.time)
@andersy005 Thanks for these examples! I updated my notebook to only use the 6 members that have all their data in a single netcdf file and the notebook runs further than before. I think I'm now hitting the issue of xarray
not playing nicely with cftime==1.0.4
, but I'll keep this issue open until intake-esm
is updated and I can have all 11 members in a dataset with a time dimension of length 1980.
Current issue with notebook: opening the dataset with decode_times=True
leads to following from esmlab.resample()
Please open dataset with `decode_times=False`
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-9-98967eb7d640> in <module>
----> 1 ds2_ann = esmlab.resample(ds2, freq='ann')
2 ds2_ann
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in resample(dset, freq, weights, time_coord_name)
760
761 else:
--> 762 ds = dset.esmlab.set_time(time_coord_name=time_coord_name).compute_ann_mean(weights=weights)
763
764 new_history = f'\n{datetime.now()} esmlab.resample(<DATASET>, freq="{freq}")'
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in set_time(self, time_coord_name, year_offset)
345 except Exception as exc:
346 print('Please open dataset with `decode_times=False`')
--> 347 raise exc
348 self.setup()
349 return self
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in set_time(self, time_coord_name, year_offset)
340 self._ds[self.tb_name],
341 units=self.time_attrs['units'],
--> 342 calendar=self.time_attrs['calendar'],
343 )
344 self.time_bound.data = tb_data
cftime/_cftime.pyx in cftime._cftime.date2num()
cftime/_cftime.pyx in cftime._cftime._dateparse()
cftime/_cftime.pyx in cftime._cftime._datesplit()
AttributeError: 'NoneType' object has no attribute 'split'
Following the advice and setting decode_times=False
leads to
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
<ipython-input-7-98967eb7d640> in <module>
----> 1 ds2_ann = esmlab.resample(ds2, freq='ann')
2 ds2_ann
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in resample(dset, freq, weights, time_coord_name)
760
761 else:
--> 762 ds = dset.esmlab.set_time(time_coord_name=time_coord_name).compute_ann_mean(weights=weights)
763
764 new_history = f'\n{datetime.now()} esmlab.resample(<DATASET>, freq="{freq}")'
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/contextlib.py in inner(*args, **kwds)
72 def inner(*args, **kwds):
73 with self._recreate_cm():
---> 74 return func(*args, **kwds)
75 return inner
76
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in compute_ann_mean(self, weights)
503 return da_weighted_mean.where(mask)
504
--> 505 ds_resample_mean = dset.apply(weighted_mean_arr, wgts=wgts)
506
507 if self.time_bound is not None:
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/dataset.py in apply(self, func, keep_attrs, args, **kwargs)
4138 variables = {
4139 k: maybe_wrap_array(v, func(v, *args, **kwargs))
-> 4140 for k, v in self.data_vars.items()
4141 }
4142 if keep_attrs is None:
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/dataset.py in <dictcomp>(.0)
4138 variables = {
4139 k: maybe_wrap_array(v, func(v, *args, **kwargs))
-> 4140 for k, v in self.data_vars.items()
4141 }
4142 if keep_attrs is None:
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/esmlab/core.py in weighted_mean_arr(darr, wgts)
491 ones = xr.where(cond, 0.0, 1.0)
492 mask = (
--> 493 darr.resample({self.time_coord_name: 'A'}).mean(dim=self.time_coord_name).notnull()
494 )
495 da_sum = (
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/common.py in resample(self, indexer, skipna, closed, label, base, keep_attrs, loffset, restore_coord_dims, **indexer_kwargs)
1036 grouper=grouper,
1037 resample_dim=RESAMPLE_DIM,
-> 1038 restore_coord_dims=restore_coord_dims,
1039 )
1040
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/resample.py in __init__(self, dim, resample_dim, *args, **kwargs)
172 self._resample_dim = resample_dim
173
--> 174 super().__init__(*args, **kwargs)
175
176 def apply(self, func, shortcut=False, args=(), **kwargs):
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/groupby.py in __init__(self, obj, group, squeeze, grouper, bins, restore_coord_dims, cut_kwargs)
334 # TODO: sort instead of raising an error
335 raise ValueError("index must be monotonic for resampling")
--> 336 full_index, first_items = self._get_index_and_items(index, grouper)
337 sbins = first_items.values.astype(np.int64)
338 group_indices = [slice(i, j) for i, j in zip(sbins[:-1], sbins[1:])] + [
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/xarray/core/groupby.py in _get_index_and_items(self, index, grouper)
432 first_items = grouper.first_items(index)
433 else:
--> 434 first_items = s.groupby(grouper).first()
435 _apply_loffset(grouper, first_items)
436 full_index = first_items.index
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/generic.py in groupby(self, by, axis, level, as_index, sort, group_keys, squeeze, observed, **kwargs)
7892 squeeze=squeeze,
7893 observed=observed,
-> 7894 **kwargs
7895 )
7896
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in groupby(obj, by, **kwds)
2520 raise TypeError("invalid type: {}".format(obj))
2521
-> 2522 return klass(obj, by, **kwds)
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/groupby/groupby.py in __init__(self, obj, keys, axis, level, grouper, exclusions, selection, as_index, sort, group_keys, squeeze, observed, **kwargs)
389 sort=sort,
390 observed=observed,
--> 391 mutated=self.mutated,
392 )
393
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/groupby/grouper.py in _get_grouper(obj, key, axis, level, sort, observed, mutated, validate)
511 # a passed-in Grouper, directly convert
512 if isinstance(key, Grouper):
--> 513 binner, grouper, obj = key._get_grouper(obj, validate=False)
514 if key.key is None:
515 return grouper, [], obj
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/resample.py in _get_grouper(self, obj, validate)
1446 def _get_grouper(self, obj, validate=True):
1447 # create the resampler and return our binner
-> 1448 r = self._get_resampler(obj)
1449 r._set_binner()
1450 return r.binner, r.grouper, r.obj
/glade/work/mlevy/miniconda3/envs/cesm2-marbl/lib/python3.7/site-packages/pandas/core/resample.py in _get_resampler(self, obj, kind)
1441 "Only valid with DatetimeIndex, "
1442 "TimedeltaIndex or PeriodIndex, "
-> 1443 "but got an instance of %r" % type(ax).__name__
1444 )
1445
TypeError: Only valid with DatetimeIndex, TimedeltaIndex or PeriodIndex, but got an instance of 'CFTimeIndex'
Note that Keith had a similar problem with https://github.com/NCAR/esmlab/issues/151 but running with decode_times=True
and cftime==1.0.3.4
fixed it for him and in my case it seems like I need to be running with decode_times=False
for some reason... I'm not entirely clear where the
Please open dataset with `decode_times=False`
comes from.
It looks like https://github.com/NCAR/intake-esm/pull/171 fixes the issue I was having with the time dimension, thanks @andersy005 !
@andersy005 -- I created a directory /glade/work/mlevy/intake-esm-collection
that contains two subdirectories of interest:
csv.gz
contains csv
-formatted (and gz
-compressed) data-frame outputjson
contains a JSON
file pointing to the corresponding csv.gz
data-frameThere are currently two collections: CESM1-CMIP5_only-NOT_CMORIZED
and CESM2-CMIP6_only-NOT_CMORIZED
.
What's the process for updating this repository to include them? I was thinking I would fork & clone this repo, create a branch, add my files, and then update the json to point to /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/
instead of /glade/work/mlevy/intake-esm-collection/csv.gz/
but if there is a different procedure already in place I'm happy to follow the rules.
What's the process for updating this repository to include them? I was thinking I would fork & clone this repo, create a branch, add my files, and then update the json to point to /glade/collections/cmip/catalog/intake-esm-datastore/catalogs/ instead of /glade/work/mlevy/intake-esm-collection/csv.gz/
This repo is mirror-red in a centralized location on glade. The catalogs and their corresponding JSON files reside in/glade/collections/cmip/catalog/intake-esm-datastore/catalogs
. Oonce #40 is merged, you should be able to access both the json and the corresponding csv from this location.
With help from @andersy005 I turned @matt-long's netcdf-based
CESM2-CMIP6
metadata store into a gzipped CSV file. It's on glade at/glade/work/mlevy/CMIP6-CESM2_only-NOT_CMORIZED.csv.gz
.Python code to generate this: