JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Support for unmeshed lat/lon w/ different names and in Datasets w/ multiple data_vars #5

Closed spencerahill closed 5 years ago

spencerahill commented 6 years ago

Thanks for xESMF! It's been great for my recent re-gridding needs.

The data I need to regrid generally has lat and lon each stored as 1D arrays, rather than as meshes, and with names 'lat' and 'lon', respectively, rather than 'x', and 'y'. Also, I am regridding Datasets with multiple data_vars.

Each of these issues required a little bit of extra code before it would work with xESMF. Since I think these are common use cases, I thought I'd paste it here. Hopefully they're pretty self explanatory...I didn't take the time to write full docstrings. The logic could likely be improved also, especially the last one.

You're welcome to incorporate whatever you'd like of them into xESMF if you'd like. But no problem if not.

import numpy as np
import xarray as xr
import xesmf as xe

LON_STR = 'lon'
LAT_STR = 'lat'

def _replace_dim(da, olddim, newdim):
    """From https://github.com/pydata/xarray/issues/1273."""
    renamed = da.rename({olddim: newdim.name})

    # note that alignment along a dimension is skipped when you are overriding
    # the relevant coordinate values
    renamed.coords[newdim.name] = newdim
    return renamed

def meshed_latlon_arr(arr, lon_str=LON_STR, lat_str=LAT_STR):
    """Create new array with meshed lat and lon coordinates."""
    lonmesh, latmesh = np.meshgrid(arr[lon_str], arr[lat_str])
    renamed = arr.rename({lat_str: 'y', lon_str: 'x'})
    renamed.coords[lat_str] = (('y', 'x'), latmesh)
    renamed.coords[lon_str] = (('y', 'x'), lonmesh)
    return renamed

def meshed_latlon_ds(ds, lon_str=LON_STR, lat_str=LAT_STR):
    """Create new dataset with meshed lat and lon coordinates."""
    meshed_arrs = {}
    for name, arr in ds.data_vars.items():
        meshed_arrs.update({name: meshed_latlon_arr(arr)})
    return xr.Dataset(data_vars=meshed_arrs, attrs=ds.attrs)

def regrid_unmeshed_arr(arr, target_grid, method='bilinear'):
    """Regrid the DataArray that has 1D lat and lon coordinates."""
    meshed = meshed_latlon_ds(arr)
    return xe.regrid(meshed, target_grid, 
                     meshed, method=method, verbose=True)

def regrid_unmeshed_ds(ds, target_grid, method='bilinear'):
    """Regrid the Dataset that has 1D lat and lon coordinates."""
    meshed = meshed_latlon_ds(ds)

    data_vars = {}
    for name, data_var in meshed.data_vars.items():
        regridded = xe.regrid(meshed, target_grid, 
                              data_var, method=method)
        data_vars.update({name: regridded})

    coords = {}
    xy_set = False
    for name, coord in meshed.coords.items():
        if ('y' in coord or 'x' in coord) and name not in ('y', 'x', 'lat', 'lon'):
            regridded = xe.regrid(meshed, target_grid, 
                                  coord, method=method)
            coords.update({name: regridded})
            if not xy_set:
                coords.update({'y': regridded['y']})
                coords.update({'x': regridded['x']})
                coords.update({'lat': regridded['lat']})
                coords.update({'lon': regridded['lon']})
                xy_set = True
        elif name not in ('y', 'x', 'lat', 'lon'):
            coords.update({name: coord})

    return xr.Dataset(data_vars=data_vars,
                      coords=coords, attrs=ds.attrs) 
JiaweiZhuang commented 6 years ago

Thanks for being interested in xESMF and for sharing the codes!

Both are good suggestions. I am planning to re-design xESMF internals in the next version (solve #2 ), after that it should have much more native support for xarray datatypes 😄

spencerahill commented 6 years ago

Cool, looking forward to it!

One other feature I was hoping to eventually implement was auto-generating meshed lon and lat bound arrays (when they are missing or 1D), in order to support conservative interpolation.

spencerahill commented 6 years ago

One other feature I was hoping to eventually implement was auto-generating meshed lon and lat bound arrays (when they are missing or 1D), in order to support conservative interpolation.

I have now implemented this:

LAT_STR = 'lat'
LON_STR = 'lon'
LAT_BOUNDS_STR = 'lat_b'
LON_BOUNDS_STR = 'lon_b'
X_STR = 'x'
Y_STR = 'y'
X_BOUNDS_STR = 'x_b'
Y_BOUNDS_STR = 'y_b'

def add_lat_lon_bounds(arr, lat_str=LAT_STR, lon_str=LON_STR,
                       lat_bounds_str=LAT_BOUNDS_STR,
                       lon_bounds_str=LON_BOUNDS_STR,
                       lat_min=-90., lat_max=90.,
                       lon_min=0., lon_max=360.):
    """Add bounding arrays to lat and lon arrays."""
    lon_vals = arr[lon_str].values
    lon_bounds_values = 0.5*(lon_vals[:-1] + lon_vals[1:])
    lon_bounds = np.concatenate([[lon_min], lon_bounds_values, [lon_max]])

    lat_vals = arr[lat_str].values
    lat_bounds_values = 0.5*(lat_vals[:-1] + lat_vals[1:])
    lat_bounds = np.concatenate([[lat_min], lat_bounds_values, [lat_max]])

    ds = arr.to_dataset()
    ds.coords[lon_bounds_str] = xr.DataArray(lon_bounds, dims=[X_BOUNDS_STR])
    ds.coords[lat_bounds_str] = xr.DataArray(lat_bounds, dims=[Y_BOUNDS_STR])

    return ds

def mesh_latlon_arrs_and_bounds(arr, lat_str=LAT_STR, lon_str=LON_STR,
                                lat_bounds_str=LAT_BOUNDS_STR,
                                lon_bounds_str=LON_BOUNDS_STR,
                                x_str=X_STR, y_str=Y_STR,
                                x_bounds_str=X_BOUNDS_STR,
                                y_bounds_str=Y_BOUNDS_STR, 
                                lat_min=-90., lat_max=90., 
                                lon_min=0., lon_max=360.):
    """Add lat/lon bounds and mesh lat, lon, and bounds.

    Resulting xarray.Dataset can be used by xESMF.regrid using the 
    'conservative' method.

    """
    arr_with_bounds = add_lat_lon_bounds(
        arr, lat_str=lat_str, lon_str=lon_str, 
        lat_bounds_str=lat_bounds_str,
        lon_bounds_str=lon_bounds_str, lat_min=lat_min, lat_max=lat_max,
        lon_min=lon_min, lon_max=lon_max)

    meshed_with_bounds = meshed_latlon_arr(arr_with_bounds,
                                           lon_str=lon_str, lat_str=lat_str,
                                           x_str=x_str, y_str=y_str)

    return meshed_latlon_arr(meshed_with_bounds, 
                             lat_str=lat_bounds_str, 
                             lon_str=lon_bounds_str,
                             x_str=x_bounds_str, y_str=y_bounds_str)

meshed_latlon_arr is as defined in my previous comment. The dataset outputted by the mesh_latlon_arrs_and_bounds function is then ready to be used for conservative interpolation:

meshed = mesh_latlon_arrs_and_bounds(arr)
conservative = xe.regrid(meshed, target_grid180, meshed['precip'], 
                         method='conservative')

FYI, as you suggested in https://github.com/JiaweiZhuang/xESMF/issues/10#issuecomment-343635806, I can confirm that using conservative interpolation does prevent the missing wraparound point.

JiaweiZhuang commented 6 years ago

There are 3 issues in this thread (better break down?):

v0.1.1 now accepts 1D arrays for rectilinear grids (demo).

The ability to regrid xarray.DataSet is planned for the next version.

The center-bound switching is pretty tricky. For rectilinear grids, I can add a convenience function to xesmf.util, but inferring bounds for curvilinear grids can be hard especially when the grid spans over the pole (related issue: pydata/xarray#781).

rabernat commented 6 years ago

I think it should be possible to resolve some of these issues via integration with xgcm. I opened #13 to discuss how this might work.

JiaweiZhuang commented 5 years ago

v0.2 now takes multi-variable dataset as input.

Use #13 for the center-bound switching problem.