ELPHE-WW / unstructured-mesh-parcels-sandbox

A sandbox repository to draft unstructured mesh IO, connectivity & neighbor search methods, and interpolation schemes.
2 stars 0 forks source link

Review Pangeo-PyInterp #5

Open fluidnumerics-joe opened 1 month ago

fluidnumerics-joe commented 1 month ago

The Pangeo project seems to have some methods for handling interpolation from unstructured grids on the sphere.

https://pangeo-pyinterp.readthedocs.io/en/latest/auto_examples/ex_unstructured.html

erikvansebille commented 1 month ago

Maybe I'm confused, but is interpolation on an unstructured grid not a dedicated work package in ELPHE-WW? Was this not part of the tasks of Sergey and @koldunovn? To what extent should this be on your plate, @fluidnumerics-joe?

Can't you for now pick one of the interpolation method of the link above, and then swap it when a better/more accurate one has been developed? What do you think, @StJuricke?

fluidnumerics-joe commented 1 month ago

I'm in the early days of discovery, building an understanding of what tools exist in the python ecosystem to perform interpolation from an unstructured grid of data to particle locations. There's always a balance of limiting dependencies (sometimes things don't work as advertised) and taking on internal development. Making an informed decision requires a cursory look into both options.

Yesterday was the first day I came across Pangeo-PyInterp; I've known about Pangeo for some time, and was quite pleased they had an extension into unstructured grids.

I suspect that we will be able to use Pangeo-PyInterp for our purposes and likely UXArray for file IO (exclusively). But now we need to see how we'd be able to get these working together. I am curious about performance as well and if its usage is consistent with how you'd want things done in parcels, e.g. to support JIT compiling with numba

fluidnumerics-joe commented 1 month ago

As far as division of labor amongst ELPHE-WW, I am not aware of any overlap with other team member's responsibilities.

fluidnumerics-joe commented 1 month ago

Dolfinx might also be suitable (https://github.com/FEniCS/dolfinx) and is built around a c++ backend.

fluidnumerics-joe commented 1 month ago

I've added a notebook that compares Pangeo-PyInterp interpolation methods for the example ICON grid. I'm using a random distribution of 10,000 particles and a simple function defined at face centroids.

Each method has a number of options that have not been fully explored, but this at least lays out how to use UXArray with Pangeo-PyInterp and provides a cursory look at what's available.

StJuricke commented 1 month ago

@erikvansebille @fluidnumerics-joe If there is an already existing way to easily use an available interpolation, we should use it. The work packages were originally laid out in such a way that the theoretical underpinning of optimal interpolation would be provided by the partner institutions and then implemented by Fluid Numerics. But if such an optimal interpolation for unstructured grids is already implemented, we might as well use it.

koldunovn commented 1 month ago

Several methods of relatively easy interpolation from unstructured grids (both FESOM and ICON) are listed here

For many practical purposes for high resolution data the nearest neighbour is just fine - one can use the version available in scipy (NearestNDInterpolator), or explicitly using KDTree (here the advantage is that you can store the indexes on disk, not regenerate them every time). Example of the later shown here

For vector data nearest neighbour is only OK for visualisation but, linear works OK-ish, and I would start with it. In particular Matplotlib's LinearTriInterpolator is good, as it takes triangles into account. It is slow, though, so I usually interpolate one field with LinearTriInterpolator to create a mask, and then use usual LinearNDInterpolator from scipy. Linear interpolation will be very slow, especially for larger datasets.

Another method is to use cdo ether to interpolate netCDF files directly, or to use them to generate weights for further interpolation.

First approach is more or less covered here starting from working on netCDF files, and moving then to applying weights with smmregrid, that allow efficiently regrid in parallel.

Most of the approaches for FESOM are summarised in fint package.

In my opinion for simple tests whatever linear interpolation is fine. For working with realistic outputs I would use smmregrid with weights generated by cdo. One can try different methods available in cdo

fluidnumerics-joe commented 1 month ago

Interpolation onto moving particles is a bit different than interpolation from one grid to another. At each time level, we need to map from unstructured grid points to a point-cloud. The interpolation is often broken down into neighbor search and interpolation based off the neighbors.

There are many different methods for neighbor search and many different methods for interpolation once we have the neighbors. Packages like PyInterp and dolfinx provide the search and interpolation algorithms under a neatly packaged API. Some tools, like dolfinx, can work with triangles on spherical shells, which provides a more accurate distance metric for interpolation weights - using matplotlib would require we handle all the metric terms on our own. I need to do more digging on PyInterp to understand what they are doing.

In the end, the combinations of search, interpolate, and time integrators will ultimately impact accuracy, performance, and scalability. Additionally, questions of robustness are open at this point . I'm keen on seeing if there is community wisdom from other fields (e.g. the particle-in-cell folks stand out to me) on methods that ensure no particle in-grounding or have min/max bounds on interpolant results which is important for stability in numerical integration of particles.

koldunovn commented 1 month ago

I see, interpolation to moving particles is a different story, of course. I though we discussing interpolation for the "simple approach" task, when we just interpolate unstructured data to something like NEMO grid and try to run Parcels from it.