NCAR / cesm-lens-aws

Examples of analysis of CESM LENS data publicly available on Amazon S3 (us-west-2 region) using xarray and dask
https://doi.org/10.26024/wt24-5j82
BSD 3-Clause "New" or "Revised" License
43 stars 23 forks source link

Where are all the ocean variables? #34

Open rabernat opened 4 years ago

rabernat commented 4 years ago

I started to look at the LENS AWS data. I discovered there is very little available

import intake_esm
import intake
url = 'https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json'
col = intake.open_esm_datastore(url)
col.search(component='ocn').df
    component   frequency   experiment  variable    path
0   ocn monthly 20C SALT    s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-SAL...
1   ocn monthly 20C SSH s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-SSH...
2   ocn monthly 20C SST s3://ncar-cesm-lens/ocn/monthly/cesmLE-20C-SST...
3   ocn monthly CTRL    SALT    s3://ncar-cesm-lens/ocn/monthly/cesmLE-CTRL-SA...
4   ocn monthly CTRL    SSH s3://ncar-cesm-lens/ocn/monthly/cesmLE-CTRL-SS...
5   ocn monthly CTRL    SST s3://ncar-cesm-lens/ocn/monthly/cesmLE-CTRL-SS...
6   ocn monthly RCP85   SALT    s3://ncar-cesm-lens/ocn/monthly/cesmLE-RCP85-S...
7   ocn monthly RCP85   SSH s3://ncar-cesm-lens/ocn/monthly/cesmLE-RCP85-S...
8   ocn monthly RCP85   SST s3://ncar-cesm-lens/ocn/monthly/cesmLE-RCP85-S...

There are only 3 variables: SALT (3D), SSH (2D), and SST (2D).

At minimum, I would also like to have THETA (3D), UVEL (3D), VVEL (3D), and WVEL (3D), and all the surface fluxes of heat and freshwater. Beyond that, it would be ideal to also have the necessary variables to reconstruct the tracer and momentum budgets.

Are there plans to add more data?

bonnland commented 4 years ago

OK, that sounds like good advice. I'm assuming that removal of these variables is also something that can be done retroactively. Be sure to let me know if this not the case, or I will go ahead with the same procedure we've been using for now (since we have to go back anyway to fix the metadata for our other Zarr stores).

rabernat commented 4 years ago

I'm assuming that removal of these variables is also something that can be done retroactively.

Should be as simple as deleting the directories for those variables and re-consolidating metadata.

andersy005 commented 4 years ago

You're currently wasting a non-negligible amount of space by storing all of these duplicate TAREA etc. variables in each of the ocean datasets.

It turns out that these variables consume ~20 MB per zarr store.

An even better option is to just drop all of the non-dimension coordinates before writing the zarr data, and then saving them to a standalone grid dataset, which can be brought in as needed for geometric calculations.

👍. Would this grid dataset include static variables only? It appears that the LLC4320_grid includes time as well. By static variables I am referring to scalars and time-independent variables:

>>> print(grid_vars)
['hflux_factor', 'nsurface_u', 'DXU', 'latent_heat_vapor', 'salt_to_Svppt', 'DYT', 'TLONG', 'DYU', 'HTE', 'rho_air', 'HU', 'ULONG', 'DXT', 'rho_sw', 'HUS', 'HUW', 'moc_components', 'TAREA', 'ULAT', 'REGION_MASK', 'grav', 'transport_regions', 'KMU', 'sound', 'omega', 'ANGLET', 'HT', 'UAREA', 'heat_to_PW', 'days_in_norm_year', 'salt_to_ppt', 'dzw', 'sea_ice_salinity', 'cp_air', 'salt_to_mmday', 'dz', 'fwflux_factor', 'TLAT', 'HTN', 'mass_to_Sv', 'radius', 'latent_heat_fusion', 'T0_Kelvin', 'salinity_factor', 'sflux_factor', 'transport_components', 'KMT', 'rho_fw', 'cp_sw', 'ocn_ref_salinity', 'vonkar', 'nsurface_t', 'ANGLE', 'stefan_boltzmann', 'ppt_to_salt', 'momentum_factor']

Removing these grid variables produces a clean xarray dataset:

<xarray.Dataset>
Dimensions:       (d2: 2, lat_aux_grid: 395, member_id: 40, moc_z: 61, nlat: 384, nlon: 320, time: 1872, z_t: 60, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
Coordinates:
  * z_t           (z_t) float32 500.0 1500.0 2500.0 ... 512502.8 537500.0
  * z_t_150m      (z_t_150m) float32 500.0 1500.0 2500.0 ... 13500.0 14500.0
  * moc_z         (moc_z) float32 0.0 1000.0 2000.0 ... 525000.94 549999.06
  * z_w_top       (z_w_top) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
  * z_w_bot       (z_w_bot) float32 1000.0 2000.0 3000.0 ... 525000.94 549999.06
  * lat_aux_grid  (lat_aux_grid) float32 -79.48815 -78.952896 ... 89.47441 90.0
  * z_w           (z_w) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
  * time          (time) object 1850-02-01 00:00:00 ... 2006-01-01 00:00:00
  * member_id     (member_id) int64 1 2 3 4 5 6 7 ... 34 35 101 102 103 104 105
Dimensions without coordinates: d2, nlat, nlon
Data variables:
    time_bound    (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
    VVEL          (member_id, time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray>
Attributes:
    nsteps_total:              750
    nco_openmp_thread_number:  1
    cell_methods:              cell_methods = time: mean ==> the variable val...
    tavg_sum:                  2592000.0
    tavg_sum_qflux:            2592000.0
    source:                    CCSM POP2, the CCSM Ocean Component
    contents:                  Diagnostic and Prognostic Variables
jeffdlb commented 4 years ago

@rabernat wrote:

The chunk choice on WVEL (and presumably other 3D variables) is, in my view, less than ideal... First, the chunks are on the large side (235.93 MB). Second, each vertical level is in a separate chunk, while 20 years of time are stored contiguously.

FYI, for the 3D atmospheric data (at least monthly Q) there each chunk contains all ensemble members, 12 months of data, and 2 levels:

<xarray.DataArray 'Q' (member_id: 40, time: 1032, lev: 30, lat: 192, lon: 288)> dask.array<zarr, shape=(40, 1032, 30, 192, 288), dtype=float32, chunksize=(40, 12, 2, 192, 288), chunktype=numpy.ndarray>

If we were to put all 30 levels in one chunk then we'd need to divide something else by a factor of ~15. Perhaps the x-y dimension should be 4x4 chunks instead of global?

I know Anderson was striving for 100MB chunks but haven't checked the size of these. The ocean data have, I think, 60 levels instead of 30, so the problem is even worse.

Also, @jhamman stated at the start of this project that it is possible to re-chunk under the hood if we don't like the arrangement, but I'm curious about how you do that in practice given the immutability of objects in an object store.

cspencerjones commented 4 years ago

I also just opened the data and had a look. I agree with Ryan that rechunking so that each chunk contains all vertical levels would be very helpful: oceanographers like to plot sections! I don't object to chunking more in time in order to achieve this. I also think that it's sensible to continue chunking by memberID, because I will want to write and test my code for one member and then operate on all the members only once or twice. I'll probably hold off doing anything more until this is a bit more sorted out. Thanks to everyone for putting in this effort!

andersy005 commented 4 years ago

As an update I have re-chunked the data accordingly for all ocean variables:

<xarray.Dataset>
Dimensions:       (d2: 2, lat_aux_grid: 395, member_id: 40, moc_z: 61, nlat: 384, nlon: 320, time: 1872, z_t: 60, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
Coordinates:
  * z_t           (z_t) float32 500.0 1500.0 2500.0 ... 512502.8 537500.0
  * z_t_150m      (z_t_150m) float32 500.0 1500.0 2500.0 ... 13500.0 14500.0
  * moc_z         (moc_z) float32 0.0 1000.0 2000.0 ... 525000.94 549999.06
  * z_w_top       (z_w_top) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
  * z_w_bot       (z_w_bot) float32 1000.0 2000.0 3000.0 ... 525000.94 549999.06
  * lat_aux_grid  (lat_aux_grid) float32 -79.48815 -78.952896 ... 89.47441 90.0
  * z_w           (z_w) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
  * time          (time) object 1850-02-01 00:00:00 ... 2006-01-01 00:00:00
  * member_id     (member_id) int64 1 2 3 4 5 6 7 ... 34 35 101 102 103 104 105
Dimensions without coordinates: d2, nlat, nlon
Data variables:
    time_bound    (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
    VVEL          (member_id, time, z_t, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray>

As you can see, I removed the grid variables. I could use some feedback on my comment above in https://github.com/NCAR/cesm-lens-aws/issues/34#issuecomment-612556759 regarding what needs to go into a standalone grid dataset. The re-chunked data are residing on GLADE for now, and am ready to transfer them to S3 once the grid dataset has been sorted out.

jeffdlb commented 4 years ago

The re-chunked data are residing on GLADE for now, and am ready to transfer them to S3 once the grid dataset has been sorted out.

Does @jhamman have a strategy for re-chunking in place directly on AWS S3? I suspect this would require reading data from the old objects, creating the new objects in a separate bucket as scratch space, deleting the old objects, copying the new objects to the main bucket, deleting the new objects from the scratch bucket. I can create a scratch bucket under our AWS account if desired.

rabernat commented 4 years ago

This is a minor nit, but I personally perfer time_bound to also be in coords, not data_vars. Then you will just have one data variable per dataset, which has a nice, clean feel.

rabernat commented 4 years ago

Also, there appear to be quite a few coordinates that are not used by the data variables. These could probably be removed as well.

bonnland commented 4 years ago

I have created the Zarr files for TAUX and TAUY, but I chose to place all members in a single chunk because the chunks are so much smaller (these are 2D variables, so each chunk would be 1/60 the size of a 3D variable chunk).

But because I didn't perform the same metadata operations as @andersy005, and because they are fast to recreate, I will let Anderson make these also.

andersy005 commented 4 years ago

As an update, I updated the chunking scheme for all existing ocean variables on AWS-S3, removed the grid variables from the zarr stores, and created a standalone grid zarr store:

In [2]: import intake
   ...: url = 'https://raw.githubusercontent.com/NCAR/cesm-lens-aws/master/intake-catalogs/aws-cesm1-le.json'
   ...: col = intake.open_esm_datastore(url)
   ...: subset = col.search(component='ocn')

In [3]: subset.unique(columns=['variable', 'experiment', 'frequency'])
Out[3]:
{'variable': {'count': 11,
  'values': ['SALT',
   'SFWF',
   'SHF',
   'SSH',
   'SST',
   'TEMP',
   'UVEL',
   'VNS',
   'VNT',
   'VVEL',
   'WVEL']},
 'experiment': {'count': 3, 'values': ['20C', 'CTRL', 'RCP85']},
 'frequency': {'count': 1, 'values': ['monthly']}}
In [1]: import s3fs
   ...: import xarray as xr
   ...:
   ...: fs = s3fs.S3FileSystem(anon=True)
   ...: s3_path = 's3://ncar-cesm-lens/ocn/monthly/cesmLE-CTRL-WVEL.zarr'
   ...: ds = xr.open_zarr(fs.get_mapper(s3_path), consolidated=True)
   ...: ds
Out[1]:
<xarray.Dataset>
Dimensions:     (d2: 2, member_id: 1, nlat: 384, nlon: 320, time: 21612, z_w_top: 60)
Coordinates:
  * member_id   (member_id) int64 1
  * time        (time) object 0400-02-01 00:00:00 ... 2201-01-01 00:00:00
    time_bound  (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
  * z_w_top     (z_w_top) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
Dimensions without coordinates: d2, nlat, nlon
Data variables:
    WVEL        (member_id, time, z_w_top, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray>
Attributes:
    Conventions:               CF-1.0; http://www.cgd.ucar.edu/cms/eaton/netc...
    NCO:                       4.3.4
    calendar:                  All years have exactly  365 days.
    cell_methods:              cell_methods = time: mean ==> the variable val...
    contents:                  Diagnostic and Prognostic Variables
    nco_openmp_thread_number:  1
    revision:                  $Id: tavg.F90 41939 2012-11-14 16:37:23Z mlevy...
    source:                    CCSM POP2, the CCSM Ocean Component
    tavg_sum:                  2678400.0
    tavg_sum_qflux:            2678400.0
    title:                     b.e11.B1850C5CN.f09_g16.005

In [2]: s3_path = 's3://ncar-cesm-lens/ocn/grid.zarr'

In [3]: grid = xr.open_zarr(fs.get_mapper(s3_path), consolidated=True)
In [6]: xr.merge([ds, grid])
Out[6]:
<xarray.Dataset>
Dimensions:               (d2: 2, lat_aux_grid: 395, member_id: 1, moc_comp: 3, moc_z: 61, nlat: 384, nlon: 320, time: 21612, transport_comp: 5, transport_reg: 2, z_t: 1, z_t_150m: 15, z_w: 60, z_w_bot: 60, z_w_top: 60)
Coordinates:
  * member_id             (member_id) int64 1
  * time                  (time) object 0400-02-01 00:00:00 ... 2201-01-01 00:00:00
    time_bound            (time, d2) object dask.array<chunksize=(6, 2), meta=np.ndarray>
  * z_w_top               (z_w_top) float32 0.0 1000.0 ... 500004.7 525000.94
    ANGLE                 (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    ANGLET                (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    DXT                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    DXU                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    DYT                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    DYU                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    HT                    (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    HTE                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    HTN                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    HU                    (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    HUS                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    HUW                   (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    KMT                   (nlat, nlon) float64 dask.array<chunksize=(192, 320), meta=np.ndarray>
    KMU                   (nlat, nlon) float64 dask.array<chunksize=(192, 320), meta=np.ndarray>
    REGION_MASK           (nlat, nlon) float64 dask.array<chunksize=(192, 320), meta=np.ndarray>
    T0_Kelvin             float64 ...
    TAREA                 (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    TLAT                  (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    TLONG                 (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    UAREA                 (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    ULAT                  (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    ULONG                 (nlat, nlon) float64 dask.array<chunksize=(192, 160), meta=np.ndarray>
    cp_air                float64 ...
    cp_sw                 float64 ...
    days_in_norm_year     timedelta64[ns] ...
    dz                    (z_t) float32 dask.array<chunksize=(1,), meta=np.ndarray>
    dzw                   (z_w) float32 dask.array<chunksize=(60,), meta=np.ndarray>
    fwflux_factor         float64 ...
    grav                  float64 ...
    heat_to_PW            float64 ...
    hflux_factor          float64 ...
  * lat_aux_grid          (lat_aux_grid) float32 -79.48815 -78.952896 ... 90.0
    latent_heat_fusion    float64 ...
    latent_heat_vapor     float64 ...
    mass_to_Sv            float64 ...
    moc_components        (moc_comp) |S256 dask.array<chunksize=(3,), meta=np.ndarray>
  * moc_z                 (moc_z) float32 0.0 1000.0 ... 525000.94 549999.06
    momentum_factor       float64 ...
    nsurface_t            float64 ...
    nsurface_u            float64 ...
    ocn_ref_salinity      float64 ...
    omega                 float64 ...
    ppt_to_salt           float64 ...
    radius                float64 ...
    rho_air               float64 ...
    rho_fw                float64 ...
    rho_sw                float64 ...
    salinity_factor       float64 ...
    salt_to_Svppt         float64 ...
    salt_to_mmday         float64 ...
    salt_to_ppt           float64 ...
    sea_ice_salinity      float64 ...
    sflux_factor          float64 ...
    sound                 float64 ...
    stefan_boltzmann      float64 ...
    transport_components  (transport_comp) |S256 dask.array<chunksize=(5,), meta=np.ndarray>
    transport_regions     (transport_reg) |S256 dask.array<chunksize=(2,), meta=np.ndarray>
    vonkar                float64 ...
  * z_t                   (z_t) float32 500.0
  * z_t_150m              (z_t_150m) float32 500.0 1500.0 ... 13500.0 14500.0
  * z_w                   (z_w) float32 0.0 1000.0 2000.0 ... 500004.7 525000.94
  * z_w_bot               (z_w_bot) float32 1000.0 2000.0 ... 549999.06
Dimensions without coordinates: d2, moc_comp, nlat, nlon, transport_comp, transport_reg
Data variables:
    WVEL                  (member_id, time, z_w_top, nlat, nlon) float32 dask.array<chunksize=(1, 6, 60, 384, 320), meta=np.ndarray>
jeffdlb commented 4 years ago

As an update, I updated the chunking scheme for all existing ocean variables on AWS-S3, removed the grid variables from the zarr stores, and created a standalone grid zarr store

@andersy005 Did you have to create the new Zarr on GLADE and then delete/upload/replace the Zarr stores on S3, or was it possible to re-chunk in place on AWS?

jeffdlb commented 4 years ago

I am updating the dataset landing page to include the new variables.

QUESTION: We added VNS & VNT (salt and heat fluxes in y-direction). Shouldn't we also include UES & UET (salt and heat fluxes in x-direction), and maybe WTS & WTT (fluxes across top face)? I don't see how only one component of the flux vectors can be useful.

bonnland commented 4 years ago

I am updating the dataset landing page to include the new variables.

Hi Jeff, those variables are actually in transit now. I was going to announce their availability for performance testing after the transfer was completed. Once they have been transferred, I will update the catalog for AWS users. The variables in transit are:

3D variables: DIC, DOC, UES, UET, WTS, WTT, PD

2D variables: TAUX, TAUY, TAUX2, TAUY2, QFLUX, FW, HMXL, QSW_HTP, QSW_HBL, SHF_QSW, SFWF_WRST, RESID_S, RESID_T

It has been an uphill climb to understand the difficulties of creating very large Zarr stores; the Dask workers were bogging down and crashing at first, but eventually I began understanding what configurations would lead to successful Zarr saves.

jeffdlb commented 4 years ago

@bonnland Excellent! Thank you very much. I will update the landing page to include those (but not publish until you are ready).

jeffdlb commented 4 years ago

FYI the draft unpublished landing page with recent updates is temporarily at CESM_LENS_on_AWS.20200428.htm

bonnland commented 4 years ago

@cspencerjones @rabernat @jbusecke Transfer of new ocean data is complete and available on Amazon AWS. It would be very helpful if someone could try a nontrivial computation with the data to make sure performance based on our chunking scheme is adequate.

I've confirmed that the Binder notebook on Amazon works (see the README.md for the link), and the variables are visible in the catalog. Here is what I got:

import intake
intakeEsmUrl = 'https://ncar-cesm-lens.s3-us-west-2.amazonaws.com/catalogs/aws-cesm1-le.json'
col = intake.open_esm_datastore(intakeEsmUrl)

subset = col.search(component='ocn')
subset.unique(columns=['variable', 'experiment', 'frequency'])

{'variable': {'count': 32,
  'values': ['DIC',
   'DOC',
   'FW',
   'HMXL',
   'O2',
   'PD',
   'QFLUX',
   'QSW_HBL',
   'QSW_HTP',
   'RESID_S',
   'RESID_T',
   'SALT',
   'SFWF',
   'SFWF_WRST',
   'SHF',
   'SHF_QSW',
   'SSH',
   'SST',
   'TAUX',
   'TAUX2',
   'TAUY',
   'TAUY2',
   'TEMP',
   'UES',
   'UET',
   'UVEL',
   'VNS',
   'VNT',
   'VVEL',
   'WTS',
   'WTT',
   'WVEL']},
 'experiment': {'count': 3, 'values': ['20C', 'CTRL', 'RCP85']},
 'frequency': {'count': 1, 'values': ['monthly']}}
cspencerjones commented 4 years ago

I tried a few thing with the data this morning, including calculating density from temperature and salinity and plotting sections, transforming some variables to density coordinates and plotting time means etc. I tried using multiple workers as well. This worked ok and I think that the performance is adequate.

bonnland commented 4 years ago

That's great to hear; we can tentatively move forward with the remaining variables requested so far. They are all 3D variables:

UVEL2, VVEL2 HDIFB_SALT, HDIFB_TEMP, HDIFE_SALT, HDIFE_TEMP HDIFN_SALT, HDIFN_TEMP KAPPA_ISOP, KAPPA_THIC KPP_SRC_SALT, KPP_SRC_TEMP VNT_ISOP, VNT_SUBM HOR_DIFF

I've spent some time looking at MOC, which has a different parameterization than the other variables. Any thoughts on chunking are appreciated. At first glance, it seems we want to chunk in time, and leave all other dimensions unchunked, aiming for a chunk size between 100 and 200 MB.

netcdf b.e11.B20TRLENS_RCP85.f09_g16.xbmb.010.pop.h.MOC.192001-202912 {
dimensions:
    d2 = 2 ;
    time = UNLIMITED ; // (1320 currently)
    moc_comp = 3 ;
    transport_comp = 5 ;
    transport_reg = 2 ;
    lat_aux_grid = 395 ;
    moc_z = 61 ;
    nlon = 320 ;
    nlat = 384 ;

    float MOC(time, transport_reg, moc_comp, moc_z, lat_aux_grid) ;
        MOC:_FillValue = 9.96921e+36f ;
        MOC:long_name = "Meridional Overturning Circulation" ;
        MOC:units = "Sverdrups" ;
        MOC:coordinates = "lat_aux_grid moc_z moc_components transport_region time" ;
        MOC:cell_methods = "time: mean" ;
        MOC:missing_value = 9.96921e+36f ;
jeffdlb commented 4 years ago

FYI the draft unpublished landing page with recent updates is temporarily at CESM_LENS_on_AWS.20200428.htm

Now that the new data have been uploaded, I believe I can publish this draft as the new landing page. QUESTION: Does the page need to say anything about the new approach to repeated grid variables, or is that completely transparent to the user?

bonnland commented 4 years ago

QUESTION: Does the page need to say anything about the new approach to repeated grid variables, or is that completely transparent to the user?

There are still small inconsistencies to work out, AFAIK. Unless I am mistaken, Anderson republished all the ocean data the grid variables removed, but grid variables still coexist in the atmospheric data, and these grid variables are probably distinct from the ocean variables.

The separate grid variables have been pushed to AWS, but they don't quite fit yet into our catalog framework, which is not yet general enough to handle variables that extend across experiments (CTRL, 20C, RCP85, etc). So the user can't load the grid variables until we generalize the catalog logic to make them available.

And I'm not yet clear on whether transparent loading of these variables is a simple matter. Simpler from a data provider engineering perspective would be to modify the Kay notebook to show how grid variables are loaded for area-based computations, which would require republishing the atmosphere variables. So, some kinks are left to work out.