JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

regridding cubed-sphere-type datasets #21

Open rabernat opened 6 years ago

rabernat commented 6 years ago

Does xESMF currently support cubed-sphere-type datasets? For example, reading MITgcm llc grids with xmitgcm, we get an xarray dataset that looks something like this:

<xarray.Dataset>
Dimensions:  (face: 13, i: 1080, j: 1080)
Coordinates:
    XC       (face, j, i) >f4 dask.array<shape=(13, 1080, 1080), chunksize=(1, 1080, 1080)>
    YC       (face, j, i) >f4 dask.array<shape=(13, 1080, 1080), chunksize=(1, 1080, 1080)>
  * face     (face) int64 0 1 2 3 4 5 6 7 8 9 10 11 12
  * i        (i) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
  * j        (j) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 ...
...

where XC and YC are longitude and latitude of the cell centers. There are 13 locally orthogonal rectilinear grid faces indexed by dimension face.

I would like to regrid this data to a regular lat-lon grid. I scanned the docs but couldn't find an answer to this. But presumably @JiaweiZhuang is using this with GEOS5, which uses a cubed sphere grid.

JiaweiZhuang commented 6 years ago

ESMPy supports cubed-sphere grid but not general multi-tile grid. From ESMPy doc:

Multi-tile Grid support is limited to cubed-sphere grids created on 6 processors. A cubed-sphere grid can be created on any number of processors, but only when it is created on 6 processors will the coordinates be retrievable for the entire object. A Field created from a cubed-sphere Grid cannot be written to file in parallel.

Here is an example of using cubed-sphere regridding in ESMPy, extracted from ESMF source code. That code has to be run with 6 MPI processes, one for each cubed-sphere panel. Notice the line:

if ESMF.pet_count() != 6:
    print ("ESMPy cubed sphere Grid Mesh Regridding Example requires 6 processors")
    import sys; sys.exit(0)

This cubed-sphere regridding capability is mostly for supporting GEOS-5. I haven't brought it to xESMF level since it is too specialized and requires mpi4py.

For general multi-tile grid like MITgcm's LLC grid, you can do the regridding tile-by-tile and add the results up. One caveat is that the boundaries between tiles will be missing in one specific case, as shown in the table below:

multi-tile -> lat-lon lat-lon -> multi-tile
first-order conservative algorithm correct correct
bilinear and other algorithms missing the boundaries between tiles correct

This is because the connection information between tiles is lost when the regridding is done on individual tiles. Conservative regridding is not affected because it uses the cell boundary information of each cell, not the connection information between adjacent grid points. For non-conservative regridding algorithms, you don't actually have to miss a lot of boundary lines. The LLC grid can be viewed as 2 quadrilateral tiles (polar panel + the rest), not 5 tiles (polar panel + the rest divided into 4) or 13 tiles (of equal size).

Multi-tile support does exist in ESMF, but not in ESMPy. If ESMPy has a plan for this, I am indeed willing to bring it up to xESMF. Also @bekozi @rokuingh.

bekozi commented 6 years ago

As usual, @JiaweiZhuang, you described the overall situation with multi-tile in ESMPy well.

If ESMPy has a plan for this, I am indeed willing to bring it up to xESMF.

There is no current plan for this. It gets tricky from a Python perspective since we are not exposing the ESMF DistGrid through Python where connections are configured. This would be added as part of the long arc to expose the ESMF virtual machine to allow user-defined parallel decompositions. When reading cubed-sphere from file, all this stuff happens in ESMF IO code. My sense is that we could accommodate some features through interface tricks.

@rabernat, where do the 13 faces come from (use to seeing 6)? I am admittedly not familiar with the MITgcm llc grid.

On another note, @JiaweiZhuang, do you have any ESMPy feature requests you would consider blockers? Looking through tickets, I see your name next to in-memory weight retrieval, SMM performance, grid area calculations, and additional grid information in weights file. Maybe the answer is "all of the above"...

rabernat commented 6 years ago

@rabernat, where do the 13 faces come from (use to seeing 6)? I am admittedly not familiar with the MITgcm llc grid.

Neither numpy or xarray supports ragged arrays; each array along a dimensions needs to have the same shape. The MITgcm LLC grid uses five "facets", i.e. locally orthogonal rectilinear grids (four lat-lon facets plus a polar cap). However, they are not the same size; the lat-lon facets are three times longer than they are wide. So we break the lat-lon facets into three sub-facets (now called faces), which we can then concatenate along a face dimension. 4*3 + 1 = 13.

Here is how the data is actually laid out on disk mitgcm llc grid

For more info, you can read this: http://xmitgcm.readthedocs.io/en/latest/performance.html

xgcm allows us to perform exchanges between these faces and operate on the whole dataset as a single object (lazily, since it is built on xarray / dask): http://xgcm.readthedocs.io/en/latest/grid_topology.html

I imagine that xgcm is reproducing many things that esmf can already do. I would have preferred not to have to create xgcm, but currently it is a very necessary (and well performing) tool for our research group.

JiaweiZhuang commented 6 years ago

On another note, @JiaweiZhuang, do you have any ESMPy feature requests you would consider blockers? Looking through tickets, I see your name next to in-memory weight retrieval, SMM performance, grid area calculations, and additional grid information in weights file.

Thanks for asking! The only crucial need is in-memory weight retrieval. This would lead to some API changes in xESMF, which had better happen sooner than later... Let's discuss this in #11

bekozi commented 6 years ago

@rabernat thank you for the explanation and overview! If you don't mind, can you pass along a grid file? I'm curious if ESMF can even read it (we also collect interesting grids). xgcm looks like great software. Let us know if we can help you out through xESMF or otherwise.

I do have a quick question about ragged arrays. I've always thought these were ragged arrays? Maybe not?

>>> x = np.array([np.array([1,2,3]), np.array([4,5,6,7])])
>>> x
array([array([1, 2, 3]), array([4, 5, 6, 7])], dtype=object)
>>> print(x.shape, x[0].shape, x[1].shape)
(2,) (3,) (4,)

I use them quite a bit with UGRID and numpy, and it's also how you fill VLTypes in netCDF4-python.

Let's discuss this in #11

:+1:

rabernat commented 6 years ago

If you don't mind, can you pass along a grid file?

MITgcm does not output netCDF for these types of grids. It uses its own custom binary format. That's why we made xmitgcm: http://xmitgcm.readthedocs.io/en/latest/

I can make you a netcdf file with some grid information in it, but I'm not sure exactly what you're looking for. Is the a formal definition / spec for a "grid file" that you can point me to? Eventually we would like to use gridspec in xgcm, but it doesn't seem to be adopted widely (or at all?)

I do have a quick question about ragged arrays. I've always thought these were ragged arrays? Maybe not?

I would consider that to be an unsatisfactory hack. The problem is that you have to use an object dtype to make it work. You can literally put anything into a numpy array as an object. None of the usual numpy operations work on that array (x.mean() errors), and indexing fails (x[0,0] errors). At that point, I see no advantage of packing the two arrays into a numpy array: just put them in a list:

x = [np.array([1,2,3]), np.array([4,5,6,7])]

This stack overflow post discusses the issue pretty well: https://stackoverflow.com/questions/14916407/how-do-i-stack-vectors-of-different-lengths-in-numpy

bekozi commented 6 years ago

MITgcm does not output netCDF for these types of grids. It uses its own custom binary format. That's why we made xmitgcm: http://xmitgcm.readthedocs.io/en/latest/

Got it.

I can make you a netcdf file with some grid information in it, but I'm not sure exactly what you're looking for.

Don't worry about it. I think the important thing will be for ESMPy to support multi-tile so xESMF can. If you don't mind, I can send an email to ESMF support to initiate a ticket and CC you. If you do output a GRIDSPEC netCDF at some point, we'd definitely be interested in testing it.

Is the a formal definition / spec for a "grid file" that you can point me to? Eventually we would like to use gridspec in xgcm, but it doesn't seem to be adopted widely (or at all?)

That's the one! GRIDSPEC Mosaic

You're right about it not being widely adopted, but I think it is the most commonly used convention for grid mosaics. I also think the user group is generally pretty small.

I would consider that to be an unsatisfactory hack.

That's fair. When wrapped with an explicitly defined variable length type they have been very useful and stable, but your mileage may be different. Anyway, thanks!

AaronDavidSchneider commented 3 years ago

This cubed-sphere regridding capability is mostly for supporting GEOS-5. I haven't brought it to xESMF level since it is too specialized and requires mpi4py.

Are there currently any plans to bring this to xESMF? I would like to use it for the cs Grid of MITgcm. Right now I interpolate every face individually with xESMF using the conservative method, which also seems to work fine.

LiamBindle commented 3 years ago

Hi all. For GCHP we need cubed-sphere and stretched cubed-sphere regridding too. We use the "sum conservatively-regridded faces" approach as well, and it works okay, but it's limited to the "conserve" method, it's brittle, and it's hard to extend because of the extra complexity. Lately I have been working on a few projects that are regridding-heavy, and I have a workflow that works pretty well for what I need. I thought I would share it here, and perhaps we can figure out how to support this in general in the ESM community.

The ESMF_Regrid and ESMF_RegridWeightGen tools support the gridspec file format for cubed-sphere and stretched cubed-sphere grids. I didn't find any tools online for generating these files, so I wrote a small utility for generating gridspec files. This lets you use the ESMF_RegridWeightGen tool natively, so you can generate regridding weights using any method (e.g., bilinear, 2nd order conservative, nearest, etc.), and regrid between any grid-types (assuming you can generate the grid-definition file).

Once you have the regridding weights, the actual regridding is simply a sparse matrix multiplication. I wrote a small library for doing this sparse matrix multiplication the Pythonic way (using numpy.vectorize and xarray.apply_ufunc) which you can find here: sparselt. That said it's simply a sparse matrix multiplication, so it would be easy to implement in any language.

The "nice traits" of this approach are:

This isn't a "new approach"---it's really what's just going on under-the-hood---but breaking regridding up into generating weights and a sparse matrix multiplication is an interface that's flexible and simple.

I'm happy to merge my work into somewhere else, but I'm not entirely sure it fits well in xESMF. Open to any suggestions!

LiamBindle commented 3 years ago

Lets move this discussion to https://github.com/pangeo-data/xESMF/issues/96. Any objections?