JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

Using ESMF unstructured grids #18

Open jhamman opened 6 years ago

jhamman commented 6 years ago

I'm wondering what it would take to support using ESMF's unstructured grids with xESMF. I had a conversation with @bekozi today where we discussed this briefly but we weren't sure where xESMF stood on the issue.

Ultimately, it would be great if xESM could support bi-directional remapping between structured/unstructured grids/meshes.

JiaweiZhuang commented 6 years ago

My major concern is how can xarray represent unstructured grids (as discussed in docs)?

I have little experience with unstructured grids like in MPAS and don't know how MPAS folks analyze their data. I believe that it is possible to design a data model that works fairly well with one specific type of unstructured grid (say the one used by MPAS), but an intuitive, easy-to-use frontend for arbitrary unstructured grid sounds too fancy for me.

For my personal research I use GFDL-FV3 which fortunately uses a (multi-tile) quadrilateral grid and can be handled gracefully by current xESMF. So I don't have personal needs to support unstructured grids. That said, if there's any good idea about unstructured meshes I am willing to try. It will be pretty cool.

jhamman commented 6 years ago

My major concern is how can xarray represent unstructured grids (as discussed in docs)?

My thought here is that we can follow existing conventions for this. There are existing specs/conventions for how these grids can be stored in the common data model (e.g. UGRID).

While these formats are not as intuitive as a logical grid may be, xarray should not have a problem storing a UGRID dataset. ESMF includes UGRID as a source/target grid spec so this should not be a problem.

if there's any good idea about unstructured meshes I am willing to try. It will be pretty cool.

To be clear, I'm not asking you to do any work here but just evaluating what it would take to make this work. I do have a use case that I would like to see an ESMF unstructured grid work for.

Perhaps we can get @bekozi to comment on what the workflow might look like here.

JiaweiZhuang commented 6 years ago

xarray should not have a problem storing a UGRID dataset.

xarray can indeed store all UGRID information in a Dataset but I wonder if that gives any benefit than using dictionaries or something simpler to store the same data. A whole extension is needed to utilize the connectivity information, such as allowing dr.sel(), dr.plot() on unstructured mesh.

If xarray doesn't give additional advantage for analyzing unstructured mesh, it doesn't seem to worth the effort to use it as the frontend for ESMPy regridding. Directly writing ESMPy code might be easier.

A similar issue is the current treatment of cell boundaries (pydata/xarray#1475, referred so many times...). N+1 sized boundary variables can indeed live in a Dataset but they cannot be utilized by other xarray functionalities such as plotting or slicing. And this is a much simpler problem than unstructured mesh...

jhamman commented 6 years ago

If xarray doesn't give additional advantage for analyzing unstructured mesh, it doesn't seem to worth the effort to use it as the frontend for ESMPy regridding.

I'm guessing my use case is sufficiently different from your typical use case that what doesn't seem useful to you is indeed useful to me.

As background, my use case is mapping gridded datasets to hydrologic response units (HRUs, e.g. river basin polygons). This can be done with ESMF using unstructured mesh regridding. Currently, I'm looking to build an xarray oriented tool kit for working with HRU meshes.

So my basic question for this issue is whether or not remapping between xarray datasets that use unstructured grids is within the scope of this package?

JiaweiZhuang commented 6 years ago

whether or not remapping between xarray datasets that use unstructured grids is within the scope of this package?

This is out of the current scope of my package. Right now I don't have a clear plan to deal with unstructured meshes, and wonder if directly interfacing xarray with ESMPy for unstructured mesh is the right way to go. The gridded API looks like a more suitable frontend for unstructured mesh, but it is not mature yet. There are some discussions on interfacing gridded with xarray (NOAA-ORR-ERD/gridded#20). Maybe xarray <-> gridded <-> ESMPy is a better way to use ESMPy's unstructured mesh regridding from xarray?

bekozi commented 6 years ago

Perhaps we can get @bekozi to comment on what the workflow might look like here.

I'm awful (terrible really) with catching these tagged issues. I'll do better. So, sorry again for the embarrassingly slow response.

Creating a generic unstructured grid interface can be a headache since there is no defacto standard in the climate/netCDF world. There are tools in ocgis that could help with interpreting unstructured formats and getting them into xarray if you are not interested in coding from scratch. Depending on the workflow, you can convert from shapely geometries or go directly from coordinates in a netCDF. The latter is still being ironed out. In terms of unstructured grid metadata conventions, SCRIP Unstructured, ESMF Unstructured, and UGRID are supported in ocgis.

I'd be happy to help you get set up with these interfaces as they can be a bit tricky to use. If the workflow is xarray-->ocgis-->xarray, we'd need to add a to_ocgis somewhere. Given my experience writing one going the other way, I don't expect that to be too difficult...and it's good to have a use case!

Unfortunately I don't have any experience with gridded, so I cannot comment on how interoperable it is with ESMPy. My sense is that it is probably sufficient? :confused: The way ESMF represents meshes internally is similar to UGRID, so if you decided to go with supporting a single convention and wanted to roll your own code, the UGRID to ESMPy code is not a deal breaker.

Anyway, let me know if you have any questions!

jhamman commented 6 years ago

@bekozi - thanks for revising this issue.

This is out of the current scope of my package.

I'm curious what it would take to scope this issue such that it fits in xESMF. If we just stick to the UGRID spec and ESMpy, is there a development path for adding unstructured mesh regridding? Does ocgis need to get involved or can ESMF handle the regridding procedure by itself? Does ESMF need to get involved (maybe we just use ocgis but maintain the existing regridding API)?

JiaweiZhuang commented 6 years ago

I'm curious what it would take to scope this issue such that it fits in xESMF. If we just stick to the UGRID spec and ESMpy, is there a development path for adding unstructured mesh regridding?

There are multiple things need to be done:

  1. The "front-end design": How to encode UGRID spec in xarray Dataset? As @bekozi said, there is no no de-facto standard for general unstructured grids. The specification we choose should be at least compatible with popular unstructured-grid models (e.g. MPAS), to minimize conversion effort.
  2. The "bridge": Convert xarray Dataset to ESMPy Mesh object. OCGIS might help (see the next reply)?
  3. The computation: Retrieve the regridding weights as explicit numpy arrays, and apply weights using the Dask/Scipy(Numba/Cython) stack, just like xESMF does. This is necessary for scaling-up the regridding computation to the Pangeo platform. ESMPy's built-in, MPI-based parallelization doesn't integrate well with dask/Pangeo.

For (1) and (2), I have little idea since I seldom work with unstructured meshes. I am only looking into (3), which is a general problem for both structured and unstructured grids.

Does ocgis need to get involved or can ESMF handle the regridding procedure by itself?

According to @bekozi 's reply in this comment, ocgis is used to:

Seems that it can save some coding compared to pure ESMPy. Also, ocgis has an to_xarray() method. How would it handle unstructured meshes?

Does ESMF need to get involved

Do you mean the Fortran library? No, ESMPy supports unstructured mesh, so everything can be written in Python.

bekozi commented 6 years ago

The "front-end design": How to encode UGRID spec in xarray Dataset? As @bekozi said, there is no no de-facto standard for general unstructured grids. The specification we choose should be at least compatible with popular unstructured-grid models (e.g. MPAS), to minimize conversion effort.

Looking through the MPAS Mesh Specification Document, I don't see any mention of a larger convention. I suspect a converter exists somewhere out there to UGRID - maybe you even know of one? In sum, we may need to ask some specialists to clarify this.

The "bridge": Convert xarray Dataset to ESMPy Mesh object. OCGIS might help (see the next reply)?

Yes, OCGIS can help here. I can also reasonably say that supporting the MPAS convention directly in OCGIS would not be out of the question.

ESMPy's built-in, MPI-based parallelization doesn't integrate well with dask/Pangeo.

Tangent incoming...

I have some technical inquiries regarding this, and I'm not sure where it would be best to discuss them. @jhamman, maybe you could direct me.

There are these OcgVM and OcgDist objects that handle MPI communication, decompositions, etc. Colleagues at @bird-house https://github.com/bird-house/bird-house.github.io/issues/17 and myself are interested in dask for memory scaling, but would like to leverage MPI as well for asynchronous IO, some algorithms like spatial subsetting, and interfacing with things like ESMPy. I think dask and MPI would be somewhat interoperable provided the decomposed/chunked dimensions don't change size during processing. I really hope I can reuse these vm objects to enable this (In part, I'm also looking around for shortcuts...err...elegance).

Furthermore, I think it may be possible to take another step and enable spatial decompositions in dask allowing a greater chunking variety. For example, the memory requirements for high spatial resolution regridding could be amenable to dask, and potentially xESMF, as opposed to the intermediate file approach we are using for chunked weight generation. In sum, we need a spatial halo for some operations and this is one potential approach for getting it. Maybe someone has already done this?

We're probably going to move forward with this pretty soon, and it would be great to get advice / buy in from dask experts as opposed to charging blindly into the night!

Read a shapefile representing an unstructured grid

Just to clarify, it can work with a shapefile, shapely, or directly from coordinates (like stored in a netcdf).

Interface with ESMPy for regridding

Depending on the workflow you can get a grid, mesh, field, or the whole kit and kaboodle (never actually typed that before).

Seems that it can save some coding compared to pure ESMPy. Also, ocgis has an to_xarray() method. How would it handle unstructured meshes?

I'm not sure how much to_xarray would come into this unless you offloaded all regridding to OCGIS/ESMPy. You would probably just want the ESMPy field or mesh object. We'd need a to_ocgis function that I could definitely help with. So, in the ideal case:

  1. Read data with xESMF.
  2. Convert to ocgis with metadata and tell it the convention.
  3. Have ocgis convert to appropriate ESMPy object.
  4. Continue with xESMF.
jhamman commented 6 years ago

and it would be great to get advice / buy in from dask experts as opposed to charging blindly into the night!

Probably best to open an issue on the pangeo issue tracker and ping mrocklin.

bekozi commented 6 years ago

Probably best to open an issue on the pangeo issue tracker and ping mrocklin.

Thanks. I'll try and make something more cohesive. I know the mpi-dask thing is kind of fleshed out on the dask issue tracker.