georust / rstar

R*-tree spatial index for the Rust ecosystem
https://docs.rs/rstar
Apache License 2.0
386 stars 67 forks source link

Geo points and custom distance metric #86

Closed DXist closed 2 years ago

DXist commented 2 years ago

To implement the demo in external crate I have exposed PointExp trait to public interface and disabled default PointExp implementation for Point types except arrays and tuples.

I added the demo to rstar/src/geopoint.rs.

Custom sub method calculates approximate distance between two points on WGS84 ellipsoid model of the Earth(https://en.wikipedia.org/wiki/Earth_radius#Meridional), using the code from https://github.com/vipera/cheap-ruler-rs . This allows to calculate distances with acceptable precision up to ~500km which is good enough for my use case. The method returns deltas along longitudes and latitudes scaled to distances in meters.

As I understand PointExt should be private. Is it better to introduce a public trait for coordinate metric distances and implement PointExt::sub method via something like CoordMetricDistance::delta_between_points?

michaelkirk commented 2 years ago

Hello! Were you aware of the optional rstar integration already built into geo-types? Or is this trying to achieve something else?

https://github.com/georust/geo/blob/master/geo-types/src/point.rs#L552

DXist commented 2 years ago

Hi @michaelkirk , I want to get nearest neighbours and their geo distances in meters for relatively close geo points. I am going to use these distances to obtain lower and upper bounds in Incremental Euclidean Restriction algorithm and then find nearest neighbours on road network.

michaelkirk commented 2 years ago

So you are dealing with lat/lon points, but we want to find neighbors based on meters.

I'm not familiar enough to know if this is reasonable. I've only ever used an R-Tree with boring cartesian math. So I'll have to wait for someone more knowledgable to weigh in.

edit: specifically, the approach I'd personally take is, to have my application project my points to a cartesian space (maybe using proj), and then trust the rtree to work with boring normal math.

adamreichold commented 2 years ago

specifically, the approach I'd personally take is, to have my application project my points to a cartesian space (maybe using proj), and then trust the rtree to work with boring normal math.

I am not sure about the performance implications, but I think this might be preferable. Using PointWithData, the original coordinates could be stored alongside the projected ones so that there are no roundtrip inaccuracies for the results of the neighbourhood query.

DXist commented 2 years ago

So you are dealing with lat/lon points, but we want to find neighbors based on meters.

I'm not familiar enough to know if this is reasonable. I've only ever used an R-Tree with boring cartesian math. So I'll have to wait for someone more knowledgable to weigh in.

edit: specifically, the approach I'd personally take is, to have my application project my points to a cartesian space (maybe using proj), and then trust the rtree to work with boring normal math.

Projections like Universal Mercator Projection work good for local data on city or small region level. This approach becomes inconvenient when geo coordinates are expected to be global. For UTM there are 60 zones in 2 hemispheres that require 120 local Rtrees to store global data. Zone partitioning introduces discontinuity and complicates processing geo data around the boundaries.

DXist commented 2 years ago

specifically, the approach I'd personally take is, to have my application project my points to a cartesian space (maybe using proj), and then trust the rtree to work with boring normal math.

I am not sure about the performance implications, but I think this might be preferable. Using PointWithData, the original coordinates could be stored alongside the projected ones so that there are no roundtrip inaccuracies for the results of the neighbourhood query.

I've benchmarked custom distance method using 10000 random points on city level and 100 nearest neighbour queries. Custom distance takes +18% to construct the tree. NN queries are about ~22 times slower.

It's possible to precompute latitude distance scaling coefficients with a ~0.1 degree step. This will speedup distance calculations at the expense of slightly less accurate results.

DXist commented 2 years ago

With precomputed coefficients NN queries with custom distance method are about ~18 times slower than original implementation.

DXist commented 2 years ago

I've decided to convert coordinates to 3D ECEF coordinate system and use euclidean distances. My benchmark shows that it's ~3 times faster to construct and ~21 times faster to query nearest neighbours than using custom distance metric with 2D lon/lat coordinates.

Thank you for your comments! I'm closing this issue.