OpenSenseAction / poligrain

Simplify common tasks for working with point, line and gridded sensor data, focusing on rainfall observations.
https://poligrain.readthedocs.io
BSD 3-Clause "New" or "Revised" License
2 stars 10 forks source link

Add function to find closest points to a line #43

Closed eoydvin closed 1 month ago

eoydvin commented 3 months ago

Implement function for finding points near line as discussed in #15 Uses KDTree as suggested in #37

codecov[bot] commented 3 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 100.00%. Comparing base (61c4063) to head (053a248). Report is 15 commits behind head on main.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## main #43 +/- ## ========================================== Coverage 100.00% 100.00% ========================================== Files 3 4 +1 Lines 43 253 +210 ========================================== + Hits 43 253 +210 ```

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

cchwala commented 2 months ago

Thanks @eoydvin for this contribution 👏

One major thing that we need to discuss is the naming convention for the projected coordinates. Since our data format conventions use site_0_lon I suggest to use site_0_x for the projected coordinates of the CML locations, even though this is more clunky to write and read. I would also suggest to add the naming of the projected coordinates to our data format conventions table. Any comments on that @fenclmar @maxmargraf ?

I also have some minor comments:

  1. I would suggest to not pre-compute and store the mid-points of the lines, as you do now in the notebook. I would do this calculation, which is super fast, inside the function that calculates the closest points to the lines. This way there is less "clutter" in the xarray.Dataset with a variable x and y which could also be confusing because point data uses these too. Also, the user has less code to write to set up the data before feeding it into the function.

  2. IMO offset is not the best variable name here. Why not use max_distance or something similar?

  3. Test coverage is already 100% 🎉 . But there is some specific behavior that should be tested. E.g., you should test that offset and n_closest are correctly taken into account, i.e. that there are NaNs and things like that in the output.

  4. Some clean up in the notebook: "Line-to-points distances" instead of "Line to points distances". It also has to be a level 2 header, otherwise it will appear as separate entry in the side-bar of the docs, see the docs of your PR. Also some of the text that is there as comment could go into a separate markdown cell, in particular the one at the top of your new cells in the notebook which gives a short summary. You could also add 2 or 3 level-3 headers as subsubsections, as is done in the part above your new cells.

cchwala commented 1 month ago

Great contributions @eoydvin 🎉

That is functionality for which no consistent open source implementation did exist up until now!