ECCO-GROUP / ECCOv4-py

A Python library with routines that support the loading, analysis, and plotting fields of the ECCO Version 4 Ocean and Sea-Ice State Estimate. The ecco_v4_py library builds on several valuable tools such as xmitgcm, gcm, xarray, and dask.
MIT License
31 stars 31 forks source link

Expanding resample_to_latlon to be compatible with time- and/or depth-varying fields #145

Closed dafyddstephenson closed 6 months ago

dafyddstephenson commented 1 year ago

resample_to_latlon previously only appeared to be compatible with two-dimensional data, so interpolating fields that varied in both depth and time seemingly required calling the function in a double loop. Have added two sets of lines: the first (before interpolating) detects any non-horizontal dimensions and stacks them into a single dimension, moving them to the final dimension of the data if necessary (where they are ignored by pyresample.kd_tree.resample_nearest and pyresample.kd_tree.resample_custom). The second moves the final, ignored dimension back to its original position and unstacks it into however many non-horizontal dimensions existed before interpolation. Rudimentary testing showed this was ~20 times faster for monthly averaged 3D 1992-2018 LLC90 fields than looping.

Two commits as the second was just to streamline the code slightly. Passed the py.test routines with only warnings.

dafyddstephenson commented 1 year ago

Thanks for the comments Tim. Lmk how is best to proceed - implement the changes, commit, and issue another pull request?

timothyas commented 1 year ago

Hey @dafyddstephenson thanks for all those comments. I'll need to take a closer look at the code, but you can keep updating your branch and it will be reflected in this PR - no need to close it or open anything new. You can get those files with something like

git checkout upstream/master .coveragerc 

And same for .gitignore assuming you called the ecco group branch upstream. Also for future ref I'd recommend making a new branch when you make a feature update - you may run into some funny business when you update your forked code after this PR gets merged. Happy to talk more about that if you like.

Thanks again for the updates! I'll get back to you soon.

dafyddstephenson commented 7 months ago

Hey again - completely forgot about this PR sorry! I've merged the latest changes from upstream and restored the original .gitignore and .coveragerc files, the only changes now are to resample_to_latlon.py.

timothyas commented 7 months ago

Thanks @dafyddstephenson! I'm still just trying to make sure I understand this PR. Is all of this reshaping safe to do because it ultimately does the resampling by lat/lon coordinates (e.g. XC/YC values)? Then it would totally make sense to me. Sorry for my ignorance, I've never delved into the resampling, just trusted it blindly 😅🙈

dafyddstephenson commented 7 months ago

lol nothing to apologise for, that's exactly what it's doing.

Looking at pyresample.kd_tree.resample_nearest it takes a "Swath Definition" object (1D track of lons and lats) along with data (np array) along that swath.

ECCOv4-py is currently providing a 1D swath object along with 3D data, but pyresample just flattens that to 1D anyway in this elif:

if data.ndim > 2 and data.shape[0] * data.shape[1] == valid_input_index.size:
        data = data.reshape(data.shape[0] * data.shape[1], data.shape[2])
        elif data.shape[0] != valid_input_index.size:
        data = data.ravel()

What the change in this PR does is reshape the data into 2D so that it already looks like something pyresample expects: 1D of "swath" (lat and lon) and a second dimension of "channels", so it skips this if block entirely. The "channels" we're providing are just any dimensions that aren't lon and lat, repacked into a single dimension, and unpacked again at the end.

Without the change, if space or time is left in the data given to resample_to_latlon, these also get flattened in the above ravel() call, and the data and swath are mismatched, leading to ValueError: Mismatch between geometry and dataset in kd_tree, e.g. when trying the below:

GDS=xr.open_dataset('ECCO-GRID.nc')
lo,la,_,_,R=ecco.resample_to_latlon(GDS.XC,GDS.YC,GDS.hFacC,-90,90,1,-180,180,1,mapping_method='nearest_neighbor');

Thanks again for revisiting this :)

timothyas commented 7 months ago

Ok great, that makes sense. Thank you for the explanation @dafyddstephenson. I think this is ready to be merged, but I think @ifenty or @owang01 should take a look before merging. Could one of you guys check this out? I think it will be useful.