Closed dafyddstephenson closed 6 months ago
Thanks for the comments Tim. Lmk how is best to proceed - implement the changes, commit, and issue another pull request?
Hey @dafyddstephenson thanks for all those comments. I'll need to take a closer look at the code, but you can keep updating your branch and it will be reflected in this PR - no need to close it or open anything new. You can get those files with something like
git checkout upstream/master .coveragerc
And same for .gitignore assuming you called the ecco group branch upstream. Also for future ref I'd recommend making a new branch when you make a feature update - you may run into some funny business when you update your forked code after this PR gets merged. Happy to talk more about that if you like.
Thanks again for the updates! I'll get back to you soon.
Hey again - completely forgot about this PR sorry! I've merged the latest changes from upstream and restored the original .gitignore
and .coveragerc
files, the only changes now are to resample_to_latlon.py
.
Thanks @dafyddstephenson! I'm still just trying to make sure I understand this PR. Is all of this reshaping safe to do because it ultimately does the resampling by lat/lon coordinates (e.g. XC/YC values)? Then it would totally make sense to me. Sorry for my ignorance, I've never delved into the resampling, just trusted it blindly 😅🙈
lol nothing to apologise for, that's exactly what it's doing.
Looking at pyresample.kd_tree.resample_nearest
it takes a "Swath Definition" object (1D track of lons and lats) along with data (np array) along that swath.
ECCOv4-py is currently providing a 1D swath object along with 3D data, but pyresample just flattens that to 1D anyway in this elif
:
if data.ndim > 2 and data.shape[0] * data.shape[1] == valid_input_index.size:
data = data.reshape(data.shape[0] * data.shape[1], data.shape[2])
elif data.shape[0] != valid_input_index.size:
data = data.ravel()
What the change in this PR does is reshape the data into 2D so that it already looks like something pyresample expects: 1D of "swath" (lat and lon) and a second dimension of "channels", so it skips this if
block entirely. The "channels" we're providing are just any dimensions that aren't lon and lat, repacked into a single dimension, and unpacked again at the end.
Without the change, if space or time is left in the data given to resample_to_latlon
, these also get flattened in the above ravel()
call, and the data and swath are mismatched, leading to ValueError: Mismatch between geometry and dataset
in kd_tree
, e.g. when trying the below:
GDS=xr.open_dataset('ECCO-GRID.nc')
lo,la,_,_,R=ecco.resample_to_latlon(GDS.XC,GDS.YC,GDS.hFacC,-90,90,1,-180,180,1,mapping_method='nearest_neighbor');
Thanks again for revisiting this :)
Ok great, that makes sense. Thank you for the explanation @dafyddstephenson. I think this is ready to be merged, but I think @ifenty or @owang01 should take a look before merging. Could one of you guys check this out? I think it will be useful.
resample_to_latlon
previously only appeared to be compatible with two-dimensional data, so interpolating fields that varied in both depth and time seemingly required calling the function in a double loop. Have added two sets of lines: the first (before interpolating) detects any non-horizontal dimensions and stacks them into a single dimension, moving them to the final dimension of the data if necessary (where they are ignored bypyresample.kd_tree.resample_nearest
andpyresample.kd_tree.resample_custom
). The second moves the final, ignored dimension back to its original position and unstacks it into however many non-horizontal dimensions existed before interpolation. Rudimentary testing showed this was ~20 times faster for monthly averaged 3D 1992-2018 LLC90 fields than looping.Two commits as the second was just to streamline the code slightly. Passed the
py.test
routines with only warnings.