Closed cchwala closed 1 month ago
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 100.00%. Comparing base (
61c4063
) to head (47155f0
). Report is 12 commits behind head on main.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
IMO the solution I currently have for returning the distance and neighbor IDs is good. It is borrowed and modified from #43.
I am not sure if is good to have np.nan
as entry in the array of neighbor_id
because this results in a dtype='object'
because strings and floats are mixed. The advantage of having np.nan
there is that one can easily drop these entries in xarray via xarray.DataArray..dropna()
e.g. as I do in the example notebook to get the neighbor IDs for a certain PWS which I can then use to select data or locations from all neighbors:
neighbor_ids = closest_neigbors.sel(id=pws_id).neighbor_id.dropna(dim="n_closest")
ds_pws_neighbors = ds_pws.sel(id=neighbor_ids)
The .sel()
of course only works if there is no non-matching ID in neighbor_ids
.
@eoydvin It would be good if you could have a look at my code and notebook, to check it, but also to see how it fits with your PR #43.
IMO the solution I currently have for returning the distance and neighbor IDs is good. It is borrowed and modified from #43.
I am not sure if is good to have
np.nan
as entry in the array ofneighbor_id
because this results in adtype='object'
because strings and floats are mixed. The advantage of havingnp.nan
there is that one can easily drop these entries in xarray viaxarray.DataArray..dropna()
e.g. as I do in the example notebook to get the neighbor IDs for a certain PWS which I can then use to select data or locations from all neighbors:neighbor_ids = closest_neigbors.sel(id=pws_id).neighbor_id.dropna(dim="n_closest") ds_pws_neighbors = ds_pws.sel(id=neighbor_ids)
The
.sel()
of course only works if there is no non-matching ID inneighbor_ids
.IMO the solution I currently have for returning the distance and neighbor IDs is good. It is borrowed and modified from #43.
I am not sure if is good to have
np.nan
as entry in the array ofneighbor_id
because this results in adtype='object'
because strings and floats are mixed. The advantage of havingnp.nan
there is that one can easily drop these entries in xarray viaxarray.DataArray..dropna()
e.g. as I do in the example notebook to get the neighbor IDs for a certain PWS which I can then use to select data or locations from all neighbors:neighbor_ids = closest_neigbors.sel(id=pws_id).neighbor_id.dropna(dim="n_closest") ds_pws_neighbors = ds_pws.sel(id=neighbor_ids)
The
.sel()
of course only works if there is no non-matching ID inneighbor_ids
.
Hm, good point. In the newest version of #43 I use None instead of np.nan when I first create the array using np.full([ds_cmls.cml_id.size, n_closest], None)
. Im not sure what is the best way to do it, but maybe it is good if the two functions do it the same way?
Im not sure what is the best way to do it, but maybe it is good if the two functions do it the same way?
What is returned by your function in #43 and by my function here, should have the exact same structure and conventions.
I suggest that we discuss None vs np.nan vs something_else via chat or video call during the next days. Unless there are already now good arguments for one of the options.
Conclusion from discussion with @eoydvin
None
instead of np.nan
as entry in neighbor_id
for the cases where no neighbor is found within max_distance
. This is preferred because it makes more sense from a semantic point of view, to say that there is none neighbor ID. Using .dropna()
e.g. closest_neigbors.neighbor_id.sel(id='ams12').dropna(dim='n_closest')
still works with the entries being None
.coords
from ds_points
in the returned dataset, only the id
coordinate.@eoydvin I implemented the points 2. and 3. from above. I also cleaned up the code and the notebook, which I also extended a bit. Please have a look again at this PR.
This PR adds an easy to use function for point data xarray.Datasets to calculate nearest neighbor distances using
scipy.spatial.KDTree
. The results are returned in a nicely structured xarray.Dataset.The example notebook is updated accordingly, also removing some of the usage of the distance matrix because getting nearest neighbors is more convenient with the introduced functions.