Open abby-baskind opened 3 years ago
Hey @abby-baskind, thanks for the detailed write-up. And apologies if my answers were confusing. These are a bunch of issues and I think it is worth discussing in which order to carry out what (Warning: This will be a quite personal recommendation based on my experience).
Let me try to summarize all the steps first (and please let me know if I misunderstood/forgot anything):
The if statement selecting the first member causes some issues, since some models can't or don't calculate ppCO2 for the first member.
)
c. Select member or aggregate members.Does that sound about right?
I think that 2. really is the crux to being able to 'just loop over things' afterwards. Have you tried to utilize the postprocessing
module at all? It is not working perfectly yet, but it might help quite a bit. There might be some snags in using it (but I can fix those for you!).
But perhaps it might be easier to talk through this via zoom? @gmacgilchrist?
hey @jbusecke, thanks for the insight. so yes, loading the datasets has been no problem, and 2b has been one of the biggest issues. 2c has also been a problem but maybe it'll be easier once i figure out 2b. Step 3 has been fine (my calculation might be questionable but it runs smoothly. And I've gotten stuck on step 4.
I've tried using the postprocessing combine datasets function for step 2 but similar to what i wrote for 2 the block of code below gives an empty dictionary. Maybe I'm missing a step that accomplishes 2b before I combine sets.
temp={}
for name,item in dset_dict.items():
#print(name)
#print(item.data_vars)
present = item.data_vars
if all(i in present for i in variables):
#print(name)
temp[name]=dset_dict[name]
dset_dict = temp
I'm a very auditory learner so I'm hoping when I meet with graeme tomorrow, talking through it will resolve some of these issues. If not, I'll let you know and maybe we can zoom
In your code snippet above, what is variables
? Somehow that all(...)
statement must be false all the time, perhaps a typo? could you pring some of the elements of dset_dict
, so I get an idea of what is in there?
@abby-baskind Thanks for listing out these issues, and sorry to hear you've had a frustrating week. I'll take the time to go through them carefully before we meet tomorrow.
@jbusecke, so the vars are thetao
, so
, talk
, and dissic
. Here's a chunk of code (low key I'm about to dump a bunch of info/code cause i'm hoping if i lay it all out, something will connect)
z_kwargs = {'consolidated': True, 'use_cftime': True}
query = dict(experiment_id=['historical'], table_id=['Omon'],
variable_id=variables,
grid_label=['gr'],
source_id=['E3SM-1-0', 'E3SM-1-1', 'GFDL-ESM4',
'CESM2-FV2','CESM2','MRI-ESM2-0',
'CESM2-WACCM-FV2','GFDL-CM4','CESM2-WACCM',
'E3SM-1-1-ECA'])
cat = col.search(**query)
# print(cat.df['source_id'].unique())
dset_dict_old = cat.to_dataset_dict(zarr_kwargs=z_kwargs, storage_options={'token': 'anon'},
preprocess=combined_preprocessing, aggregate=False)
Notably here, aggregate
is False
Here's a snippet of the output
'CMIP.NCAR.CESM2.historical.r3i1p1f1.Omon.talk.gr.gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r3i1p1f1/Omon/talk/gr/v20190308/.nan.20190308': <xarray.Dataset>
Dimensions: (bnds: 2, lev: 33, time: 1980, vertex: 4, x: 360, y: 180)
Coordinates:
* y (y) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
lat_bounds (y, bnds, x) float64 dask.array<chunksize=(180, 2, 360), meta=np.ndarray>
* lev (lev) float64 0.0 10.0 20.0 30.0 ... 4.5e+03 5e+03 5.5e+03
lev_bounds (lev, bnds) float64 dask.array<chunksize=(33, 2), meta=np.ndarray>
* x (x) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
lon_bounds (x, bnds, y) float64 dask.array<chunksize=(360, 2, 180), meta=np.ndarray>
* time (time) object 1850-01-15 12:59:59.999997 ... 2014-12-15 12...
time_bounds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
* bnds (bnds) int64 0 1
lon (x, y) float64 0.5 0.5 0.5 0.5 ... 359.5 359.5 359.5 359.5
lat (x, y) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
lon_verticies (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
lat_verticies (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
* vertex (vertex) int64 0 1 2 3
Data variables:
talk (time, lev, y, x) float32 dask.array<chunksize=(11, 33, 180, 360), meta=np.ndarray>
Attributes:
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 674885.0
branch_time_in_parent: 240900.0
case_id: 17
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.003
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-18T18:40:31Z
data_specs_version: 01.00.29
experiment: all-forcing simulation of the recent past
experiment_id: historical
external_variables: areacello volcello
forcing_index: 1
frequency: mon
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2....
grid: ocean data regridded from native gx1v7 displaced...
grid_label: gr
initialization_index: 1
institution: National Center for Atmospheric Research, Climat...
institution_id: NCAR
license: CMIP6 model data produced by <The National Cente...
mip_era: CMIP6
model_doi_url: https://doi.org/10.5065/D67H1H0V
nominal_resolution: 1x1 degree
parent_activity_id: CMIP
parent_experiment_id: piControl
parent_mip_era: CMIP6
parent_source_id: CESM2
parent_time_units: days since 0001-01-01 00:00:00
parent_variant_label: r1i1p1f1
physics_index: 1
product: model-output
realization_index: 3
realm: ocnBgchem
source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
source_id: CESM2
source_type: AOGCM BGC
sub_experiment: none
sub_experiment_id: none
table_id: Omon
tracking_id: hdl:21.14100/5cde1f13-dd68-4601-9fa3-f2d6cdfa8488
variable_id: talk
variant_info: CMIP6 20th century experiments (1850-2014) with ...
variant_label: r3i1p1f1
status: 2019-10-25;created;by nhn2@columbia.edu
netcdf_tracking_ids: hdl:21.14100/5cde1f13-dd68-4601-9fa3-f2d6cdfa8488
version_id: v20190308
intake_esm_varname: None
intake_esm_dataset_key: CMIP.NCAR.CESM2.historical.r3i1p1f1.Omon.talk.gr...,
'CMIP.NCAR.CESM2.historical.r7i1p1f1.Omon.so.gr.gs://cmip6/CMIP6/CMIP/NCAR/CESM2/historical/r7i1p1f1/Omon/so/gr/v20190311/.nan.20190311': <xarray.Dataset>
Dimensions: (bnds: 2, lev: 33, time: 1980, vertex: 4, x: 360, y: 180)
Coordinates:
* y (y) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
lat_bounds (y, bnds, x) float64 dask.array<chunksize=(180, 2, 360), meta=np.ndarray>
* lev (lev) float64 0.0 10.0 20.0 30.0 ... 4.5e+03 5e+03 5.5e+03
lev_bounds (lev, bnds) float64 dask.array<chunksize=(33, 2), meta=np.ndarray>
* x (x) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
lon_bounds (x, bnds, y) float64 dask.array<chunksize=(360, 2, 180), meta=np.ndarray>
* time (time) object 1850-01-15 13:00:00 ... 2014-12-15 12:00:00
time_bounds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
* bnds (bnds) int64 0 1
lon (x, y) float64 0.5 0.5 0.5 0.5 ... 359.5 359.5 359.5 359.5
lat (x, y) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
lon_verticies (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
lat_verticies (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
* vertex (vertex) int64 0 1 2 3
Data variables:
so (time, lev, y, x) float32 dask.array<chunksize=(11, 33, 180, 360), meta=np.ndarray>
Attributes:
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 674885.0
branch_time_in_parent: 273750.0
case_id: 21
cesm_casename: b.e21.BHIST.f09_g17.CMIP6-historical.007
contact: cesm_cmip6@ucar.edu
creation_date: 2019-01-19T03:01:13Z
data_specs_version: 01.00.29
experiment: all-forcing simulation of the recent past
experiment_id: historical
external_variables: areacello volcello
forcing_index: 1
frequency: mon
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2....
grid: ocean data regridded from native gx1v7 displaced...
grid_label: gr
initialization_index: 1
institution: National Center for Atmospheric Research, Climat...
institution_id: NCAR
license: CMIP6 model data produced by <The National Cente...
mip_era: CMIP6
model_doi_url: https://doi.org/10.5065/D67H1H0V
nominal_resolution: 1x1 degree
parent_activity_id: CMIP
parent_experiment_id: piControl
parent_mip_era: CMIP6
parent_source_id: CESM2
parent_time_units: days since 0001-01-01 00:00:00
parent_variant_label: r1i1p1f1
physics_index: 1
product: model-output
realization_index: 7
realm: ocean
source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
source_id: CESM2
source_type: AOGCM BGC
sub_experiment: none
sub_experiment_id: none
table_id: Omon
tracking_id: hdl:21.14100/4385236c-5ca2-4d79-a46f-e3d28a2db87...
variable_id: so
variant_info: CMIP6 20th century experiments (1850-2014) with ...
variant_label: r7i1p1f1
status: 2019-10-25;created;by nhn2@columbia.edu
netcdf_tracking_ids: hdl:21.14100/4385236c-5ca2-4d79-a46f-e3d28a2db87...
version_id: v20190311
intake_esm_varname: None
intake_esm_dataset_key: CMIP.NCAR.CESM2.historical.r7i1p1f1.Omon.so.gr.g...,
So the output looks fine/normal, considering models weren't aggregated. Of course the variables aren't merged, so I tried using postprocessing's merge_variables
: dd_new = merge_variables(dset_dict_old)
but the output is empty (literally, {}
). I did try this with fewer models and only 2 variables, hoping something simpler would work, but again, empty output. I also tried combine_datasets
, hoping it would miraculously merge the variables, and unsurprisingly, it did not. Here's a sample of the resulting dictionary and also some code.
ddict_new = combine_datasets(
dset_dict_old,
pick_first_member,
match_attrs=['source_id', 'grid_label', 'experiment_id', 'table_id']
)
# Output
'CESM2-WACCM.gr.historical.Omon': <xarray.Dataset>
Dimensions: (bnds: 2, lev: 33, time: 1980, vertex: 4, x: 360, y: 180)
Coordinates:
* y (y) float64 -89.5 -88.5 -87.5 -86.5 ... 86.5 87.5 88.5 89.5
lat_bounds (y, bnds, x) float64 dask.array<chunksize=(180, 2, 360), meta=np.ndarray>
* lev (lev) float64 0.0 10.0 20.0 30.0 ... 4.5e+03 5e+03 5.5e+03
lev_bounds (lev, bnds) float64 dask.array<chunksize=(33, 2), meta=np.ndarray>
* x (x) float64 0.5 1.5 2.5 3.5 4.5 ... 356.5 357.5 358.5 359.5
lon_bounds (x, bnds, y) float64 dask.array<chunksize=(360, 2, 180), meta=np.ndarray>
* time (time) object 1850-01-15 12:59:59.999997 ... 2014-12-15 12...
time_bounds (time, bnds) object dask.array<chunksize=(1980, 2), meta=np.ndarray>
* bnds (bnds) int64 0 1
lon (x, y) float64 0.5 0.5 0.5 0.5 ... 359.5 359.5 359.5 359.5
lat (x, y) float64 -89.5 -88.5 -87.5 -86.5 ... 87.5 88.5 89.5
lon_verticies (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
lat_verticies (vertex, x, y) float64 dask.array<chunksize=(1, 360, 180), meta=np.ndarray>
* vertex (vertex) int64 0 1 2 3
Data variables:
so (time, lev, y, x) float32 dask.array<chunksize=(12, 33, 180, 360), meta=np.ndarray>
Attributes:
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 674885.0
branch_time_in_parent: 20075.0
case_id: 4
cesm_casename: b.e21.BWHIST.f09_g17.CMIP6-historical-WACCM.001
contact: cesm_cmip6@ucar.edu
creation_date: 2019-07-29T14:19:00Z
data_specs_version: 01.00.31
experiment: all-forcing simulation of the recent past
experiment_id: historical
external_variables: areacello volcello
forcing_index: 1
frequency: mon
further_info_url: https://furtherinfo.es-doc.org/CMIP6.NCAR.CESM2-...
grid: ocean data regridded from native gx1v7 displaced...
grid_label: gr
initialization_index: 1
institution: National Center for Atmospheric Research, Climat...
institution_id: NCAR
license: CMIP6 model data produced by <The National Cente...
mip_era: CMIP6
model_doi_url: https://doi.org/10.5065/D67H1H0V
nominal_resolution: 1x1 degree
parent_activity_id: CMIP
parent_experiment_id: piControl
parent_mip_era: CMIP6
parent_source_id: CESM2-WACCM
parent_time_units: days since 0001-01-01 00:00:00
parent_variant_label: r1i1p1f1
physics_index: 1
product: model-output
realization_index: 1
realm: ocean
source: CESM2 (2017): atmosphere: CAM6 (0.9x1.25 finite ...
source_id: CESM2-WACCM
source_type: AOGCM BGC CHEM AER
sub_experiment: none
sub_experiment_id: none
table_id: Omon
tracking_id: hdl:21.14100/730e4b19-1758-4798-8324-631feaf818d9
variable_id: so
variant_info: CMIP6 CESM2 hindcast (1850-2014) with high-top a...
variant_label: r1i1p1f1
status: 2019-10-25;created;by nhn2@columbia.edu
netcdf_tracking_ids: hdl:21.14100/730e4b19-1758-4798-8324-631feaf818d9
version_id: v20190808
intake_esm_varname: None
intake_esm_dataset_key: CMIP.NCAR.CESM2-WACCM.historical.r1i1p1f1.Omon.s...,
So considering merge_variables
returned nothing and combine_datasets
returned entries with only 1 variable, it's not surprising that...
for name,item in dset_dict.items():
#print(name)
#print(item.data_vars)
present = item.data_vars
if all(i in present for i in variables):
#print(name)
temp[name]=dset_dict[name]
dset_dict = temp
...returned nothing. (Also, my dictionary names are all over the place because I copied code at different parts of the process. For each test run, the dictionary names did match, just in case you were worried that was an issue.)
I was really hoping merge_variables
would work out, since the example given in the postprocessing documentation looked exactly like what I wanted my output to be. But it didn't so I'm not really sure where it all went wrong.
That is helpful. Let me look into what is going on.
Hi @abby-baskind,
I just tried to reproduce your code, and merge_variables
works for some models.
Here is what I did:
import matplotlib.pyplot as plt
import intake
from cmip6_preprocessing.preprocessing import combined_preprocessing
from cmip6_preprocessing.utils import google_cmip_col
import numpy as np
col = google_cmip_col()
variables = ['thetao', 'so', 'talk', 'dissic']
z_kwargs = {'consolidated': True, 'use_cftime': True}
query = dict(experiment_id=['historical'], table_id=['Omon'],
variable_id=variables,
grid_label=['gr'],
source_id=['E3SM-1-0', 'E3SM-1-1', 'GFDL-ESM4',
'CESM2-FV2','CESM2','MRI-ESM2-0',
'CESM2-WACCM-FV2','GFDL-CM4','CESM2-WACCM',
'E3SM-1-1-ECA'])
cat = col.search(**query)
dset_dict_old = cat.to_dataset_dict(zarr_kwargs=z_kwargs, storage_options={'token': 'anon'},
preprocess=combined_preprocessing, aggregate=False)
Then
from cmip6_preprocessing.postprocessing import merge_variables
ddict_new = merge_variables(dset_dict_old)
Which gave me some warnings like these:
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-WACCM-FV2.historical.r3i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-WACCM-FV2.historical.r2i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-FV2.historical.r2i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2-FV2.historical.r3i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.NCAR.CESM2.historical.r10i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
/srv/conda/envs/notebook/lib/python3.8/site-packages/cmip6_preprocessing/postprocessing.py:122: UserWarning: CMIP.E3SM-Project.E3SM-1-0.historical.r5i1p1f1.Omon.gr.none failed to combine with :indexes along dimension 'time' are not equal
warnings.warn(f"{cmip6_dataset_id(ds)} failed to combine with :{e}")
There is an issue with some of the data, but we will deal with that later. More importantly, some models were succesfully combined:
list(ddict_new.keys())
['E3SM-1-0.gr.historical.Omon.r4i1p1f1',
'CESM2-WACCM-FV2.gr.historical.Omon.r1i1p1f1',
'MRI-ESM2-0.gr.historical.Omon.r3i1p1f1',
'MRI-ESM2-0.gr.historical.Omon.r4i1p1f1',
'CESM2.gr.historical.Omon.r5i1p1f1',
'CESM2.gr.historical.Omon.r7i1p1f1',
'GFDL-CM4.gr.historical.Omon.r1i1p1f1',
'E3SM-1-0.gr.historical.Omon.r1i1p1f1',
'CESM2.gr.historical.Omon.r3i1p1f1',
'MRI-ESM2-0.gr.historical.Omon.r5i1p1f1',
'CESM2.gr.historical.Omon.r4i1p1f1',
'MRI-ESM2-0.gr.historical.Omon.r1i1p1f1',
'CESM2-WACCM.gr.historical.Omon.r3i1p1f1',
'CESM2.gr.historical.Omon.r2i1p1f1',
'CESM2-WACCM.gr.historical.Omon.r1i1p1f1',
'CESM2.gr.historical.Omon.r6i1p1f1',
'CESM2-WACCM.gr.historical.Omon.r2i1p1f1',
'E3SM-1-1.gr.historical.Omon.r1i1p1f1',
'CESM2.gr.historical.Omon.r9i1p1f1',
'MRI-ESM2-0.gr.historical.Omon.r1i2p1f1',
'MRI-ESM2-0.gr.historical.Omon.r2i1p1f1',
'CESM2.gr.historical.Omon.r1i1p1f1',
'GFDL-ESM4.gr.historical.Omon.r1i1p1f1',
'CESM2.gr.historical.Omon.r11i1p1f1',
'E3SM-1-1-ECA.gr.historical.Omon.r1i1p1f1',
'E3SM-1-0.gr.historical.Omon.r3i1p1f1',
'CESM2.gr.historical.Omon.r8i1p1f1',
'CESM2-FV2.gr.historical.Omon.r1i1p1f1',
'E3SM-1-0.gr.historical.Omon.r2i1p1f1']
Checking one of the datasets seems to indicate that (at least for some models) this works:
ddict_new['CESM2-WACCM.gr.historical.Omon.r1i1p1f1']
So lets see where things go wrong.
Could you:
1) Open a new notebook and paste exactly the code I used above to confirm it still doesnt work?
2) Which pangeo deployment are you working on (you can check by looking at the url. I use https://staging.us-central1-b.gcp.pangeo.io/...
). I am curious if yours says staging
or production
.
3) What version of cmip6_preprocessing are you using. If you could paste the output of import cmip6_preprocessing;print(cmip6_preprocessing.__version__)
here, that would be helpful.
Ill look into what is causing those warnings above really quick now.
hey @gmacgilchrist, I've accomplished very little in the last week cause I've had trouble understanding and implementing a lot of the recommendations from my previous issues. In order to gather my thoughts before our wednesday meeting, I'm recapping some of my lingering issues.
1. Regridding NorCPM1 for
thetao
,talk
, andso
Example:thetao
plotI think this issue is also the source of the problem with the polar stereographic projection.
You recommended using
.assign_coords
in issue #8 and Julius gave another solution in issue #9. Both confusing2. Aggregating model members like in issue #4 The solution you suggested causes some problems.
If I use that, when I run this block:
The new
dset_dict
is empty. Julius gave a more complicated solution that I don't understand at all.3. Selecting a model member to plot in ppCO2
So, example from your code
The
if
statement selecting the first member causes some issues, since some models can't or don't calculate ppCO2 for the first member. I worked around this by adding moreif
statements for the problem models...I feel like there has to be a better, more efficient way to work around this issue.
4. Scale of ppCO2 outputs The magnitude of ppCO2 is much larger for all the CESM2, so the ppCO2 values for the other models don't really show up. You also expressed concerns about ppCO2 values as large as 2000 (i forgot the units for this oops). So I am a bit concerned by this.
I think that covers all my issues for now. I do still need to plot more of the polar stereographic projections (specifically for
fgco2
and ppco2, so I'm sure more issues will come up there