UXARRAY / uxarray

Xarray-styled package for reading and directly operating on unstructured grid datasets following UGRID conventions
https://uxarray.readthedocs.io/
Apache License 2.0
142 stars 30 forks source link

Applying remap weights from other packages (e.g., ESMF, TempestRemap?) #172

Open zarzycki opened 1 year ago

zarzycki commented 1 year ago

One of the most useful functions in NCL (for me) that I haven't been able to replicate has been the ESMF wrappers, in particular ESMF_regrid_with_weights.

A current use case. I am running a large (O(1000)) member ensemble of short hindcasts with ~1deg CAM-SE and would like to "downscale" the data to 10x10deg ~equal-area cells (currently using CS, but icosahedral, etc. also an option) for training a neural net with scikit-learn (so the downstream application is all Python). Since it is O(1000) snapshots on the same grid, it would be highly inefficient to create new map weights every loop -- ideally map weights are generated once and then can just be read and applied to multiple (xarray) slices.

I have actually struggled to find a good way of doing this with Python. There are tools such as xESMF (or digging down to ESMPy) but are highly abstract (i.e., try to take a lot of options out of the user hands and bury the ESMF settings pretty deep) and don't appear to have the exact functionality I want (e.g., take a SCRIP -> SCRIP and provide a weight file for reuse on a DataArray). The one that looks promising is an offshot of Iris but will have to dig further. Perhaps I'm missing something obvious.

I am not sure this is in uxarray's purview, but I bring it up because I currently have found the best workaround (at this time) is a two-step workflow, where I generate an intermediate file that can be passed into my sklearn code using NCL (or I could also use CDO/NCO I suppose), but ideally, everything could be done on the Python side (with metadata, etc. being preserved, which the NCL wrappers also do an excellent job of).

rljacob commented 1 year ago

Looks like ESMF_regrid_with_weights takes already computed weights. Those weights could be read from a file. Would that be enough?

vijaysm commented 1 year ago

Not a uxarray developer here. But to answer your question, it should be relatively easy to just read the map file once and load it into a scipy.sparse matrix, to apply it to a vector iteratively for all ensemble data. Is this not a viable option?

Using command-line tools in a loop will require loading maps from a NetCDF map file, applying it to a vector, and writing the result back to disk. This is because each invocation of the command is a decoupled process and there is no context. Hope that helps.

zarzycki commented 1 year ago

Looks like ESMF_regrid_with_weights takes already computed weights. Those weights could be read from a file. Would that be enough?

Yes, the idea would be we have pre-computed weights from either the ESMF or TempestRemap CLI binaries. We could then apply said (pre-computed) weights to a grid object in uxarray. The most obvious use case I'm thinking of is going from unstructured to unstructured (e.g., cubed sphere to icosahedral) meshes. There is generally a decent amount of functionality floating around to use something like bilinear or nearest-neighbor to go unstructured -> RLL inline, but if one uses a tool like TempestRemap to create weight matrices, it seems logical to have functionality to allow that mapping (perhaps within uxarray).

zarzycki commented 1 year ago

Not a uxarray developer here. But to answer your question, it should be relatively easy to just read the map file once and load it into a scipy.sparse matrix, to apply it to a vector iteratively for all ensemble data. Is this not a viable option?

Using command-line tools in a loop will require loading maps from a NetCDF map file, applying it to a vector, and writing the result back to disk. This is because each invocation of the command is a decoupled process and there is no context. Hope that helps.

Thanks @vijaysm -- I think you are correct -- once you have the weights it's a fairly trivial mathematical exercise. It would be great to have that be a capability, but also carry any associated metadata etc. over (e.g., if using weights to take an xarray.dataArray from one grid to another, coordinates, etc. all persist and are handled cleanly) -- for example, the NCL ESMF wrappers do all this under the hood, so remapping with weights provides a variable with metadata identical to the source data, except for the remapped coords (which are also labelled appropriately).

vijaysm commented 1 year ago

Thanks @vijaysm -- I think you are correct -- once you have the weights it's a fairly trivial mathematical exercise. It would be great to have that be a capability, but also carry any associated metadata etc. over (e.g., if using weights to take an xarray.dataArray from one grid to another, coordinates, etc. all persist and are handled cleanly) -- for example, the NCL ESMF wrappers do all this under the hood, so remapping with weights provides a variable with metadata identical to the source data, except for the remapped coords (which are also labelled appropriately).

Fair enough. If you just wanted to handle the SpMV for projection yourself, this only depends on the linear operator, which you can easily load from disk and apply on vectors defined on the source mesh. This assumes that the ordering of DoFs in your vector matches the operator numbering. The code below should help you accomplish parts of that if you are loading the weights from a standard SCRIP file.

import numpy as np
import scipy as sp
import netCDF4 as nc
from scipy import sparse

f = nc.Dataset('outputmap.nc')

n_a = f.dimensions['n_a'].size # col dimension
n_b = f.dimensions['n_b'].size # row dimension
n_s = f.dimensions['n_s'].size # nnz dimension

print("Map contains {0} rows, {1} cols and {2} nnz values".format(n_b, n_a, n_s))

rows = f.variables['row'][:] - 1 # row indices (1-based)
cols = f.variables['col'][:] - 1 # col indices (1-based)
nnzvals = f.variables['S'][:] # nnz map values

map = sparse.coo_matrix((nnzvals, (rows,cols)),shape=(n_b,n_a))

## Now apply Sparse matrix `map` onto a vector of size n_a to compute field of size n_b
## field_target = map @ field_source