JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Retrieve regridding weights as numpy arrays #2

Closed JiaweiZhuang closed 6 years ago

JiaweiZhuang commented 6 years ago

Most regridding schemes are linear, i.e. the output data field is linearly dependent on the input data field. Any linear transform can be viewed as a matrix-vector multiplication y = A*x, where A is a matrix (typically sparse) containing regridding weights, and x, y are input and output data fields flatten to 1D.

In practice, linear regridding schemes are broken into two steps:

  1. Calculate regridding weights (i.e. get the matrix A), from the knowledge of source and destination grids. In ESMPy, this is done by regrid = ESMF.Regrid(...).
  2. Apply regridding weights to the data field (i.e. perform A*x). In ESMPy, this is done by destfield = regrid(sourcefield, destfield), where regrid was created in the previous step.

However, ESMPy's regrid object is like a black box. It knows how to perform regridding but there's no way to explicitly view the regridding weights (the matrix A).

In the Fortran version of ESMF, the function ESMF_RegridWeightGen dumps regridding weights to NetCDF files. The content of the file is shown in "12.8 Regrid Interpolation Weight File Format" section in ESMF documention (link). The matrix A is stored in a sparse matrix form by variables row,col and S in that NetCDF file. But in ESMPy there's no function equivalent to ESMF_RegridWeightGen.

Being able to view the regridding weights in the Python-level will solve many troubles:

@bekozi seems to have some Python tools for ESMF_RegridWeightGen. Maybe we could start with that.

JiaweiZhuang commented 6 years ago

It remains to be seen whether the numpy implementation of sparse matrix multiplication will be slower than the Fortran implement in ESMPy. But calling a lot of xarray.DataArray.transpose() doesn't seem to be efficient, too.

bekozi commented 6 years ago

I'll let @rokuingh provide more information on the implementation plan for retrieving and working with weights. I don't want to put my foot in my mouth. :smile:

A couple things:

ESMPy requires the input data to have the shape [Nlat, Nlon, N1, N2,...].

ESMPy using Fortran ordering since everything underneath is built on it. Hence when using ncdump and you see (time, level, lat, lon) in ESMPy this is (lon, lat, level, time). We're brainstorming how to address this, and I won't lie that is very confusing sometimes. I know this is different from in-memory stuff, but I just wanted to bring it up.

Instead of using xarray's transpose() function to match ESMPy's expection, we can write the sparse matrix multiplication directly in numpy, taking care of dimension broadcasting.

A number of times we've encountered issues in user code related to dimension ordering. I don't have a good sense how xarray's transpose works, but it is often the case that folks want to be using something more like numpy.swapaxes as opposed to transpose.

It remains to be seen whether the numpy implementation of sparse matrix multiplication will be slower than the Fortran implement in ESMPy.

My guess is that ESMF is faster. Whether the performance gain is enough to overcome issues in usability is a different question!

A question for you, do you have a sense how array copies work with the xESMF implementation? I'm pretty sure there is a deep copy going from xarray to ESMPy. Is there another deep copy from ESMPy back to xarray? I'm mostly just curious...don't have a good sense of xarray internals.

rokuingh commented 6 years ago

Dimension ordering has been the Achilles heel of ESMPy, I suppose that is the price of building on a Fortran package, we are hoping to resolve this soonest!

JiaweiZhuang commented 6 years ago

ESMPy using Fortran ordering since everything underneath is built on it. Hence when using ncdump and you see (time, level, lat, lon) in ESMPy this is (lon, lat, level, time). We're brainstorming how to address this, and I won't lie that is very confusing sometimes. I know this is different from in-memory stuff, but I just wanted to bring it up.

It seems all numpy arrays that point to ESMPy data fields are Fortran-ordered. np.isfortran() always gives True. In principle you would expect (time, level, lat, lon) for C-ordered numpy arrays and (lon, lat, level, time) for Fortran-ordered ones. Anyway, that would not be a problem anymore if the sparse matrix multiplication is done by numpy directly.

I don't have a good sense how xarray's transpose works, but it is often the case that folks want to be using something more like numpy.swapaxes as opposed to transpose.

I think that's equivalent to np.swapaxes. Again I would want to avoid swapping axes at all, if possible.

A question for you, do you have a sense how array copies work with the xESMF implementation? I'm pretty sure there is a deep copy going from xarray to ESMPy. Is there another deep copy from ESMPy back to xarray? I'm mostly just curious...don't have a good sense of xarray internals.

I am doing deep copy in both directions. I don't think the resulting array is safe if it just points to destfield.data. (1) It will be overwritten by the next regridding operation if I reuse the regrid object and the underlying field. (2) It will be released by destfield.destroy(), although it looks like another array. I believe pointers could be very confusing for average Python users.

I actually don't understand why ESMPy uses Field (containing data and additional dimensions), not Grid (only the coordinate itself), to construct the Regrid object. Why we need to call regrid = ESMF.Regrid(sourcefield, destfield), not just regrid = ESMF.Regrid(sourcegrid, destgrid)? @bekozi @rokuingh

My guess is that ESMF is faster. Whether the performance gain is enough to overcome issues in usability is a different question!

I am not sure, considering the current heavy use of memory copies and array rearrangements. We'll need to do a benchmark once regridding weights are visible.

bekozi commented 6 years ago

I actually don't understand why ESMPy uses Field (containing data and additional dimensions), not Grid (only the coordinate itself), to construct the Regrid object. Why we need to call regrid = ESMF.Regrid(sourcefield, destfield), not just regrid = ESMF.Regrid(sourcegrid, destgrid)?

This is just how ESMF works underneath which ESMPy attempts to mirror as much as possible. I agree it would be more convenient in cases where only weights are needed to use grids/meshes directly. Here is the underlying ESMF call that ESMPy wraps: http://www.earthsystemmodeling.org/esmf_releases/last_built/ESMF_refdoc/node5.html#SECTION050365900000000000000.

bekozi commented 6 years ago

My sincere apologies for the delay on getting you the weight file write code. We have a snapshot tag that includes writing weights to file:

git clone -b ESMF_7_1_0_beta_snapshot_35 https://git.code.sf.net/p/esmf/esmf esmf

The only change to your code should be the addition of a filename argument to the regrid call:

rh = ESMF.Regrid(srcfield, dstfield, filename='/tmp/weights.nc', ...)

Please let me know if you have any questions or find any issues! Though tested, note this is considered "development code".

JiaweiZhuang commented 6 years ago

@bekozi Thanks! I am not worried about any code&API changes because I plan to only interface with the weight file. As long as the format of weights.nc is stable everything would be fine.

Is that snapshot also available on conda? I only tried to build ESMF a long time ago and forget how to do that now... Or could you just send me a sample weight file? Just lat-lon to lat-lon would be great.

bekozi commented 6 years ago

Is that snapshot also available on conda?

It is now. I sat bolt upright last night realizing I forgot to build one. :smile:

conda create -n esmpy -c nesii/label/dev-esmf -c conda-forge esmpy==7.1.0.dev35

JiaweiZhuang commented 6 years ago

@bekozi Thanks! But I got an error when using the filename keyword. Please see this gist.

The test environment is docker-miniconda3

bekozi commented 6 years ago

Thanks for the reproducer. This is a very obscure bug related to Python 3. The code will work in Python 2.7. Will that be okay for a bit? I'll be looking into the Python 3 failure - it's somehow connected to ctypes.

bekozi commented 6 years ago

That wasn't as bad as I predicted. We were not testing a new enough ctypes version so missed this bug. I tested using docker-miniconda3 and the code works. I'll work on getting a new snapshot. In the meantime, if you want to use Python 3, you can patch the esmpy source code:

Index: src/addon/ESMPy/src/ESMF/interface/cbindings.py
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
--- src/addon/ESMPy/src/ESMF/interface/cbindings.py (revision 6f48610bd5243c1c33cfcd4e6ae9c2788c6c4a56)
+++ src/addon/ESMPy/src/ESMF/interface/cbindings.py (revision 9e1e6544f77b6ae10af92c65ef7ac9b2b761beb6)
@@ -1984,9 +1984,13 @@
             raise TypeError('dstMaskValues must have dtype=int32')
         dstMaskValues_i = ESMP_InterfaceInt(dstMaskValues)

+    # Need to create a C string buffer for Python 3.
+    b_filename = filename.encode('utf-8')
+    b_filename = ct.create_string_buffer(b_filename)
+
     rc = _ESMF.ESMC_FieldRegridStoreFile(srcField.struct.ptr,
                                      dstField.struct.ptr,
-                                     filename,
+                                     b_filename,
                                      srcMaskValues_i,
                                      dstMaskValues_i,
                                      ct.byref(routehandle),
JiaweiZhuang commented 6 years ago

@bekozi Thanks for looking into this! I am OK with Python2.7 for now.

JiaweiZhuang commented 6 years ago

@bekozi The col and row variables seem to have wrong values. Please see this updated gist.

Also, can it also write out grid information as in the Fortran version? area_a and area_b will be particularly useful.

bekozi commented 6 years ago

Hmmm...I was able to reproduce the issue on my end. Thanks for the code. It doesn't appear the weight file write is appropriately tested on the ESMPy side. I ran the grids through the CLI app and the indexing aligns, so the issue is definitely in the Python interface. I'll see what I can find...may take a bit since I didn't code the original implementation. Sorry for the trouble.

JiaweiZhuang commented 6 years ago

@bekozi That's fine! Take your time.

bekozi commented 6 years ago

Hi, @JiaweiZhuang. A new snapshot with a conda build is available that addresses the weight file write. I added a test for your specific case. Again, please let me know if you find any issues!

JiaweiZhuang commented 6 years ago

@bekozi Thanks so much! I've checked that both bilinear and conservative algorithms are working correctly. Now I will have a lot to update in xESMF!

There are two additional issues, though. They are not urgent but kind of confuse me. 1) ESMPy doc says that it can calculate grid area but I can't make it work. See ESMPy_grid_area.ipynb 2) The periodic boundary condition doesn't seem to work. See ESMPy_periodic.ipynb. I can manually extend one more grid point to fix this problem, but I want to make sure I didn't misunderstand the API doc.

The two issues are actually not related to this thread or to xESMF itself. I am willing to continue the discussion here but please do tell me if you want to move to other place to discuss ESMPy-specific things.

bekozi commented 6 years ago

Glad to hear it. :sweat_smile:

We can move this discussion over to the ESMF support list. I'll create a ticket and will email you. I'll also create a ticket for getting the grid information into the ESMPy-generated weight file.

JiaweiZhuang commented 6 years ago

Close this issue because ESMPy can now output regridding weights. See #6 for an example.

Will reopen this if there's any bug about weights writing.