abby-baskind / seniorthesis

0 stars 2 forks source link

@ Graeme, i've been stuck all week so here are some things i'd like to go over when we meet #10

Open abby-baskind opened 3 years ago

abby-baskind commented 3 years ago

hey @gmacgilchrist, I've accomplished very little in the last week cause I've had trouble understanding and implementing a lot of the recommendations from my previous issues. In order to gather my thoughts before our wednesday meeting, I'm recapping some of my lingering issues.

1. Regridding NorCPM1 for thetao, talk, and so Example: thetao plot thetao_july5

I think this issue is also the source of the problem with the polar stereographic projection.

Screen Shot 2021-07-06 at 5 57 46 PM

You recommended using .assign_coords in issue #8 and Julius gave another solution in issue #9. Both confusing

2. Aggregating model members like in issue #4 The solution you suggested causes some problems.

grouped = cat.df.groupby(cat.groupby_attrs)
cat.df = grouped.first().reset_index()

If I use that, when I run this block:

temp={}
for name,item in dset_dict.items():
    #print(name)
    #print(item.data_vars)
    present = item.data_vars
    if all(i in present for i in variables):
        #print(name)
        temp[name]=dset_dict[name]
dset_dict = temp

The new dset_dict is empty. Julius gave a more complicated solution that I don't understand at all.

3. Selecting a model member to plot in ppCO2

So, example from your code

### MULTIPLE MODELS
n = len(dset_dict.items())
fig, axarr = plt.subplots(nrows=n,figsize=[10,3*n])
for ax,(k, ds) in zip(axarr.flat,dset_dict.items()):
    ds = ds.isel(time=1200).sel(x=slice(180,200)).mean('x',keep_attrs=True)
    if 'member_id' in ds.dims:
        ds = ds.isel(member_id=0)
    pco2 = calc_PpCO2(ds)
    sigma2 = calc_sigma2(ds)
    meridionalsection_with_sigma2(ax,pco2,sigma2,clims=[0,2000],title=ds.attrs['intake_esm_dataset_key'])

The if statement selecting the first member causes some issues, since some models can't or don't calculate ppCO2 for the first member. I worked around this by adding more if statements for the problem models...

for name, ds_pco2 in dset_dict.items():
    ax = axarr_pco2.flat[ax_idx]
    ds_pco2 = ds_pco2.isel(time=1200).sel(x=slice(180,200)).mean('x',keep_attrs=True)
    if 'member_id' in ds_pco2.dims:
        if name == 'CMIP.MRI.MRI-ESM2-0.historical.Omon.gr':
            ds_pco2 = ds_pco2.isel(member_id=1)
        elif name == 'CMIP.NCAR.CESM2.historical.Omon.gr':
            ds_pco2 = ds_pco2.isel(member_id=2)
        else:
            ds_pco2 = ds_pco2.isel(member_id=0)
    pco2 = calc_PpCO2(ds_pco2)
    sigma2 = calc_sigma2(ds_pco2)
    meridionalsection_with_sigma1(ax,pco2,sigma2, clims=[500,2000],title=ds_pco2.attrs['intake_esm_dataset_key'])
    ax_idx += 1

I feel like there has to be a better, more efficient way to work around this issue.

4. Scale of ppCO2 outputs pco2 The magnitude of ppCO2 is much larger for all the CESM2, so the ppCO2 values for the other models don't really show up. You also expressed concerns about ppCO2 values as large as 2000 (i forgot the units for this oops). So I am a bit concerned by this.

I think that covers all my issues for now. I do still need to plot more of the polar stereographic projections (specifically for fgco2 and ppco2, so I'm sure more issues will come up there

jbusecke commented 3 years ago

Hey @abby-baskind, thanks for the detailed write-up. And apologies if my answers were confusing. These are a bunch of issues and I think it is worth discussing in which order to carry out what (Warning: This will be a quite personal recommendation based on my experience).

Let me try to summarize all the steps first (and please let me know if I misunderstood/forgot anything):

  1. Load in all datasets (this seems to work fine?)
  2. Combine and organize datasets a. Combine variables b. Filter datasets that do not have the required variables for 3. (I assume this is the problem here? The if statement selecting the first member causes some issues, since some models can't or don't calculate ppCO2 for the first member.) c. Select member or aggregate members.
  3. Calculate ppCO2
  4. Regrid the output to regular lon/lat grid (Depending on how the output is given this might have to be done in 2a).
  5. Visualize the output

Does that sound about right?

I think that 2. really is the crux to being able to 'just loop over things' afterwards. Have you tried to utilize the postprocessing module at all? It is not working perfectly yet, but it might help quite a bit. There might be some snags in using it (but I can fix those for you!).

But perhaps it might be easier to talk through this via zoom? @gmacgilchrist?

abby-baskind commented 3 years ago

hey @jbusecke, thanks for the insight. so yes, loading the datasets has been no problem, and 2b has been one of the biggest issues. 2c has also been a problem but maybe it'll be easier once i figure out 2b. Step 3 has been fine (my calculation might be questionable but it runs smoothly. And I've gotten stuck on step 4.

I've tried using the postprocessing combine datasets function for step 2 but similar to what i wrote for 2 the block of code below gives an empty dictionary. Maybe I'm missing a step that accomplishes 2b before I combine sets.

temp={}
for name,item in dset_dict.items():
    #print(name)
    #print(item.data_vars)
    present = item.data_vars
    if all(i in present for i in variables):
        #print(name)
        temp[name]=dset_dict[name]
dset_dict = temp

I'm a very auditory learner so I'm hoping when I meet with graeme tomorrow, talking through it will resolve some of these issues. If not, I'll let you know and maybe we can zoom

jbusecke commented 3 years ago

In your code snippet above, what is variables? Somehow that all(...) statement must be false all the time, perhaps a typo? could you pring some of the elements of dset_dict, so I get an idea of what is in there?

gmacgilchrist commented 3 years ago

@abby-baskind Thanks for listing out these issues, and sorry to hear you've had a frustrating week. I'll take the time to go through them carefully before we meet tomorrow.

abby-baskind commented 3 years ago

@jbusecke, so the vars are thetao, so, talk, and dissic. Here's a chunk of code (low key I'm about to dump a bunch of info/code cause i'm hoping if i lay it all out, something will connect)

z_kwargs = {'consolidated': True, 'use_cftime': True}
query = dict(experiment_id=['historical'], table_id=['Omon'], 
             variable_id=variables,
             grid_label=['gr'],
             source_id=['E3SM-1-0', 'E3SM-1-1', 'GFDL-ESM4',
                        'CESM2-FV2','CESM2','MRI-ESM2-0',
                        'CESM2-WACCM-FV2','GFDL-CM4','CESM2-WACCM',
                        'E3SM-1-1-ECA'])

cat = col.search(**query)

# print(cat.df['source_id'].unique())
dset_dict_old = cat.to_dataset_dict(zarr_kwargs=z_kwargs, storage_options={'token': 'anon'},
                                preprocess=combined_preprocessing, aggregate=False)

Notably here, aggregate is False

Here's a snippet of the output

 'CMIP.NCAR.CESM2.historical.r3i1p1f1.Omon.talk.gr.gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r3i1p1f1/Omon/talk/gr/v20190308/.nan.20190308': <xarray.Dataset>
 Dimensions:        (bnds: 2, lev: 33, time: 1980, vertex: 4, x: 360, y: 180)
 Coordinates:
   * y              (y) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
     lat_bounds     (y, bnds, x) float64 dask.array<chunksize=(180, 2, 360), meta=np.ndarray>
   * lev            (lev) float64 0.0 10.0 20.0 30.0 ... 4.5e+03 5e+03 5.5e+03
     lev_bounds     (lev, bnds) float64 dask.array<chunksize=(33, 2), meta=np.ndarray>
   * x              (x) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
     lon_bounds     (x, bnds, y) float64 dask.array<chunksize=(360, 2, 180), meta=np.ndarray>
   * time           (time) object 1850-01-15 12:59:59.999997 ... 2014-12-15 12...
     time_bounds    (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
   * bnds           (bnds) int64 0 1
     lon            (x, y) float64 0.5 0.5 0.5 0.5 ... 359.5 359.5 359.5 359.5
     lat            (x, y) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
     lon_verticies  (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
     lat_verticies  (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
   * vertex         (vertex) int64 0 1 2 3
 Data variables:
     talk           (time, lev, y, x) float32 dask.array<chunksize=(11, 33, 180, 360), meta=np.ndarray>
 Attributes:
     Conventions:             CF-1.7 CMIP-6.2
     activity_id:             CMIP
     branch_method:           standard
     branch_time_in_child:    674885.0
     branch_time_in_parent:   240900.0
     case_id:                 17
     cesm_casename:           b.e21.BHIST.f09_g17.CMIP6-historical.003
     contact:                 cesm_cmip6@ucar.edu
     creation_date:           2019-01-18T18:40:31Z
     data_specs_version:      01.00.29
     experiment:              all-forcing simulation of the recent past
     experiment_id:           historical
     external_variables:      areacello volcello
     forcing_index:           1
     frequency:               mon
     further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2....
     grid:                    ocean data regridded from native gx1v7 displaced...
     grid_label:              gr
     initialization_index:    1
     institution:             National Center for Atmospheric Research, Climat...
     institution_id:          NCAR
     license:                 CMIP6 model data produced by <The National Cente...
     mip_era:                 CMIP6
     model_doi_url:           https://doi.org/10.5065/D67H1H0V
     nominal_resolution:      1x1 degree
     parent_activity_id:      CMIP
     parent_experiment_id:    piControl
     parent_mip_era:          CMIP6
     parent_source_id:        CESM2
     parent_time_units:       days since 0001-01-01 00:00:00
     parent_variant_label:    r1i1p1f1
     physics_index:           1
     product:                 model-output
     realization_index:       3
     realm:                   ocnBgchem
     source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
     source_id:               CESM2
     source_type:             AOGCM BGC
     sub_experiment:          none
     sub_experiment_id:       none
     table_id:                Omon
     tracking_id:             hdl:21.14100/5cde1f13-dd68-4601-9fa3-f2d6cdfa8488
     variable_id:             talk
     variant_info:            CMIP6 20th century experiments (1850-2014) with ...
     variant_label:           r3i1p1f1
     status:                  2019-10-25;created;by nhn2@columbia.edu
     netcdf_tracking_ids:     hdl:21.14100/5cde1f13-dd68-4601-9fa3-f2d6cdfa8488
     version_id:              v20190308
     intake_esm_varname:      None
     intake_esm_dataset_key:  CMIP.NCAR.CESM2.historical.r3i1p1f1.Omon.talk.gr...,

 'CMIP.NCAR.CESM2.historical.r7i1p1f1.Omon.so.gr.gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r7i1p1f1/Omon/so/gr/v20190311/.nan.20190311': <xarray.Dataset>
 Dimensions:        (bnds: 2, lev: 33, time: 1980, vertex: 4, x: 360, y: 180)
 Coordinates:
   * y              (y) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
     lat_bounds     (y, bnds, x) float64 dask.array<chunksize=(180, 2, 360), meta=np.ndarray>
   * lev            (lev) float64 0.0 10.0 20.0 30.0 ... 4.5e+03 5e+03 5.5e+03
     lev_bounds     (lev, bnds) float64 dask.array<chunksize=(33, 2), meta=np.ndarray>
   * x              (x) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
     lon_bounds     (x, bnds, y) float64 dask.array<chunksize=(360, 2, 180), meta=np.ndarray>
   * time           (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
     time_bounds    (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
   * bnds           (bnds) int64 0 1
     lon            (x, y) float64 0.5 0.5 0.5 0.5 ... 359.5 359.5 359.5 359.5
     lat            (x, y) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
     lon_verticies  (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
     lat_verticies  (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
   * vertex         (vertex) int64 0 1 2 3
 Data variables:
     so             (time, lev, y, x) float32 dask.array<chunksize=(11, 33, 180, 360), meta=np.ndarray>
 Attributes:
     Conventions:             CF-1.7 CMIP-6.2
     activity_id:             CMIP
     branch_method:           standard
     branch_time_in_child:    674885.0
     branch_time_in_parent:   273750.0
     case_id:                 21
     cesm_casename:           b.e21.BHIST.f09_g17.CMIP6-historical.007
     contact:                 cesm_cmip6@ucar.edu
     creation_date:           2019-01-19T03:01:13Z
     data_specs_version:      01.00.29
     experiment:              all-forcing simulation of the recent past
     experiment_id:           historical
     external_variables:      areacello volcello
     forcing_index:           1
     frequency:               mon
     further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2....
     grid:                    ocean data regridded from native gx1v7 displaced...
     grid_label:              gr
     initialization_index:    1
     institution:             National Center for Atmospheric Research, Climat...
     institution_id:          NCAR
     license:                 CMIP6 model data produced by <The National Cente...
     mip_era:                 CMIP6
     model_doi_url:           https://doi.org/10.5065/D67H1H0V
     nominal_resolution:      1x1 degree
     parent_activity_id:      CMIP
     parent_experiment_id:    piControl
     parent_mip_era:          CMIP6
     parent_source_id:        CESM2
     parent_time_units:       days since 0001-01-01 00:00:00
     parent_variant_label:    r1i1p1f1
     physics_index:           1
     product:                 model-output
     realization_index:       7
     realm:                   ocean
     source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
     source_id:               CESM2
     source_type:             AOGCM BGC
     sub_experiment:          none
     sub_experiment_id:       none
     table_id:                Omon
     tracking_id:             hdl:21.14100/4385236c-5ca2-4d79-a46f-e3d28a2db87...
     variable_id:             so
     variant_info:            CMIP6 20th century experiments (1850-2014) with ...
     variant_label:           r7i1p1f1
     status:                  2019-10-25;created;by nhn2@columbia.edu
     netcdf_tracking_ids:     hdl:21.14100/4385236c-5ca2-4d79-a46f-e3d28a2db87...
     version_id:              v20190311
     intake_esm_varname:      None
     intake_esm_dataset_key:  CMIP.NCAR.CESM2.historical.r7i1p1f1.Omon.so.gr.g...,

So the output looks fine/normal, considering models weren't aggregated. Of course the variables aren't merged, so I tried using postprocessing's merge_variables: dd_new = merge_variables(dset_dict_old) but the output is empty (literally, {}). I did try this with fewer models and only 2 variables, hoping something simpler would work, but again, empty output. I also tried combine_datasets, hoping it would miraculously merge the variables, and unsurprisingly, it did not. Here's a sample of the resulting dictionary and also some code.

ddict_new = combine_datasets(
    dset_dict_old,
    pick_first_member,
    match_attrs=['source_id', 'grid_label', 'experiment_id', 'table_id']
)

# Output
'CESM2-WACCM.gr.historical.Omon': <xarray.Dataset>
 Dimensions:        (bnds: 2, lev: 33, time: 1980, vertex: 4, x: 360, y: 180)
 Coordinates:
   * y              (y) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
     lat_bounds     (y, bnds, x) float64 dask.array<chunksize=(180, 2, 360), meta=np.ndarray>
   * lev            (lev) float64 0.0 10.0 20.0 30.0 ... 4.5e+03 5e+03 5.5e+03
     lev_bounds     (lev, bnds) float64 dask.array<chunksize=(33, 2), meta=np.ndarray>
   * x              (x) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
     lon_bounds     (x, bnds, y) float64 dask.array<chunksize=(360, 2, 180), meta=np.ndarray>
   * time           (time) object 1850-01-15 12:59:59.999997 ... 2014-12-15 12...
     time_bounds    (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
   * bnds           (bnds) int64 0 1
     lon            (x, y) float64 0.5 0.5 0.5 0.5 ... 359.5 359.5 359.5 359.5
     lat            (x, y) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
     lon_verticies  (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
     lat_verticies  (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
   * vertex         (vertex) int64 0 1 2 3
 Data variables:
     so             (time, lev, y, x) float32 dask.array<chunksize=(12, 33, 180, 360), meta=np.ndarray>
 Attributes:
     Conventions:             CF-1.7 CMIP-6.2
     activity_id:             CMIP
     branch_method:           standard
     branch_time_in_child:    674885.0
     branch_time_in_parent:   20075.0
     case_id:                 4
     cesm_casename:           b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.001
     contact:                 cesm_cmip6@ucar.edu
     creation_date:           2019-07-29T14:19:00Z
     data_specs_version:      01.00.31
     experiment:              all-forcing simulation of the recent past
     experiment_id:           historical
     external_variables:      areacello volcello
     forcing_index:           1
     frequency:               mon
     further_info_url:        https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
     grid:                    ocean data regridded from native gx1v7 displaced...
     grid_label:              gr
     initialization_index:    1
     institution:             National Center for Atmospheric Research, Climat...
     institution_id:          NCAR
     license:                 CMIP6 model data produced by <The National Cente...
     mip_era:                 CMIP6
     model_doi_url:           https://doi.org/10.5065/D67H1H0V
     nominal_resolution:      1x1 degree
     parent_activity_id:      CMIP
     parent_experiment_id:    piControl
     parent_mip_era:          CMIP6
     parent_source_id:        CESM2-WACCM
     parent_time_units:       days since 0001-01-01 00:00:00
     parent_variant_label:    r1i1p1f1
     physics_index:           1
     product:                 model-output
     realization_index:       1
     realm:                   ocean
     source:                  CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
     source_id:               CESM2-WACCM
     source_type:             AOGCM BGC CHEM AER
     sub_experiment:          none
     sub_experiment_id:       none
     table_id:                Omon
     tracking_id:             hdl:21.14100/730e4b19-1758-4798-8324-631feaf818d9
     variable_id:             so
     variant_info:            CMIP6 CESM2 hindcast (1850-2014) with high-top a...
     variant_label:           r1i1p1f1
     status:                  2019-10-25;created;by nhn2@columbia.edu
     netcdf_tracking_ids:     hdl:21.14100/730e4b19-1758-4798-8324-631feaf818d9
     version_id:              v20190808
     intake_esm_varname:      None
     intake_esm_dataset_key:  CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.Omon.s...,

So considering merge_variables returned nothing and combine_datasets returned entries with only 1 variable, it's not surprising that...

for name,item in dset_dict.items():
    #print(name)
    #print(item.data_vars)
    present = item.data_vars
    if all(i in present for i in variables):
        #print(name)
        temp[name]=dset_dict[name]
dset_dict = temp

...returned nothing. (Also, my dictionary names are all over the place because I copied code at different parts of the process. For each test run, the dictionary names did match, just in case you were worried that was an issue.)

I was really hoping merge_variables would work out, since the example given in the postprocessing documentation looked exactly like what I wanted my output to be. But it didn't so I'm not really sure where it all went wrong.

jbusecke commented 3 years ago

That is helpful. Let me look into what is going on.

jbusecke commented 3 years ago

Hi @abby-baskind, I just tried to reproduce your code, and merge_variables works for some models.

Here is what I did:


import matplotlib.pyplot as plt
import intake
from cmip6_preprocessing.preprocessing import combined_preprocessing
from cmip6_preprocessing.utils import google_cmip_col
import numpy as np

col = google_cmip_col()

variables = ['thetao', 'so', 'talk', 'dissic']
z_kwargs = {'consolidated': True, 'use_cftime': True}
query = dict(experiment_id=['historical'], table_id=['Omon'], 
             variable_id=variables,
             grid_label=['gr'],
             source_id=['E3SM-1-0', 'E3SM-1-1', 'GFDL-ESM4',
                        'CESM2-FV2','CESM2','MRI-ESM2-0',
                        'CESM2-WACCM-FV2','GFDL-CM4','CESM2-WACCM',
                        'E3SM-1-1-ECA'])

cat = col.search(**query)

dset_dict_old = cat.to_dataset_dict(zarr_kwargs=z_kwargs, storage_options={'token': 'anon'},
                                preprocess=combined_preprocessing, aggregate=False)

Then

from cmip6_preprocessing.postprocessing import merge_variables
ddict_new = merge_variables(dset_dict_old)

Which gave me some warnings like these:

/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-WACCM-FV2.historical.r3i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-WACCM-FV2.historical.r2i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-FV2.historical.r2i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-FV2.historical.r3i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2.historical.r10i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.E3SM-Project.E3SM-1-0.historical.r5i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
  warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")

There is an issue with some of the data, but we will deal with that later. More importantly, some models were succesfully combined:

list(ddict_new.keys())
['E3SM-1-0.gr.historical.Omon.r4i1p1f1',
 'CESM2-WACCM-FV2.gr.historical.Omon.r1i1p1f1',
 'MRI-ESM2-0.gr.historical.Omon.r3i1p1f1',
 'MRI-ESM2-0.gr.historical.Omon.r4i1p1f1',
 'CESM2.gr.historical.Omon.r5i1p1f1',
 'CESM2.gr.historical.Omon.r7i1p1f1',
 'GFDL-CM4.gr.historical.Omon.r1i1p1f1',
 'E3SM-1-0.gr.historical.Omon.r1i1p1f1',
 'CESM2.gr.historical.Omon.r3i1p1f1',
 'MRI-ESM2-0.gr.historical.Omon.r5i1p1f1',
 'CESM2.gr.historical.Omon.r4i1p1f1',
 'MRI-ESM2-0.gr.historical.Omon.r1i1p1f1',
 'CESM2-WACCM.gr.historical.Omon.r3i1p1f1',
 'CESM2.gr.historical.Omon.r2i1p1f1',
 'CESM2-WACCM.gr.historical.Omon.r1i1p1f1',
 'CESM2.gr.historical.Omon.r6i1p1f1',
 'CESM2-WACCM.gr.historical.Omon.r2i1p1f1',
 'E3SM-1-1.gr.historical.Omon.r1i1p1f1',
 'CESM2.gr.historical.Omon.r9i1p1f1',
 'MRI-ESM2-0.gr.historical.Omon.r1i2p1f1',
 'MRI-ESM2-0.gr.historical.Omon.r2i1p1f1',
 'CESM2.gr.historical.Omon.r1i1p1f1',
 'GFDL-ESM4.gr.historical.Omon.r1i1p1f1',
 'CESM2.gr.historical.Omon.r11i1p1f1',
 'E3SM-1-1-ECA.gr.historical.Omon.r1i1p1f1',
 'E3SM-1-0.gr.historical.Omon.r3i1p1f1',
 'CESM2.gr.historical.Omon.r8i1p1f1',
 'CESM2-FV2.gr.historical.Omon.r1i1p1f1',
 'E3SM-1-0.gr.historical.Omon.r2i1p1f1']

Checking one of the datasets seems to indicate that (at least for some models) this works:

ddict_new['CESM2-WACCM.gr.historical.Omon.r1i1p1f1']
image

So lets see where things go wrong.

Could you: 1) Open a new notebook and paste exactly the code I used above to confirm it still doesnt work? 2) Which pangeo deployment are you working on (you can check by looking at the url. I use https://staging.us-central1-b.gcp.pangeo.io/...). I am curious if yours says staging or production. 3) What version of cmip6_preprocessing are you using. If you could paste the output of import cmip6_preprocessing;print(cmip6_preprocessing.__version__) here, that would be helpful.

Ill look into what is causing those warnings above really quick now.