georust / rstar

R*-tree spatial index for the Rust ecosystem
https://docs.rs/rstar
Apache License 2.0
414 stars 68 forks source link

RTree::locate_within_distance gives no performance benefit compared to linear search? #187

Closed alipheesa closed 1 week ago

alipheesa commented 1 week ago

Hi, recently I was trying to use RTree index to speed up proximity search (search of all the points in given radius) that was initially implemented as a simple linear search with utilities from "geo" crate. To implement the desired functions, I've wrapped geojson::Feature into custom PointFeature and implemented rstar::Point trait for it:

#[derive(Serialize, Deserialize, Debug, Clone)]
pub struct PointFeature {
    pub inner: Feature,
}

impl rstar::Point for PointFeature {
    type Scalar = f64;

    const DIMENSIONS: usize = 2;

    fn generate(mut generator: impl FnMut(usize) -> Self::Scalar) -> Self {
        let x = generator(0);
        let y = generator(1);
        PointFeature {
            inner: Feature {
                id: None,
                geometry: Some(geojson::Geometry::new(Value::Point(vec![x, y]))),
                properties: None,
                bbox: None,
                foreign_members: None,
            },
        }
    }

    fn nth(&self, index: usize) -> Self::Scalar {
        if let Some(geometry) = &self.inner.geometry {
            if let Value::Point(coords) = &geometry.value {
                return coords[index];
            }
        }
        panic!("Invalid GeoJSON format")
    }

    fn nth_mut(&mut self, index: usize) -> &mut Self::Scalar {
        if let Some(geometry) = &mut self.inner.geometry {
            if let Value::Point(coords) = &mut geometry.value {
                return &mut coords[index];
            }
        }
        panic!("Invalid GeoJSON format")
    }
}

The first optimization that I've tried was limiting the number of candidates via RTree::locate_in_envelope, this would perform a search inside of an AABB box (instead of circle) still requiring us to further filter based on distance. This works fine, e.g giving us around 3000 candidates (2500 after final filtration) for a dataset of 300k points and reducing the latency from ~32-40ms to ~6ms.

The second attempt was leveraging the RTree::locate_within_distance method like below:

    pub fn get_points_in_proximity_using_index_and_reprojection(
        &self,
        lat: f64,
        lon: f64,
        radius: f64,
    ) -> Vec<Feature> {
        let reprojected = transform_point_to_epsg3035(lat, lon).unwrap();
        let query_point = <PointFeature as rstar::Point>::generate(|i| vec![reprojected.0, reprojected.1][i]);

        self.geospatial_index_reprojected
            .locate_within_distance(query_point, radius.powi(2))
            .cloned()
            .collect::<Vec<_>>()
            .into_iter()
            .map(|f| f.inner)
            .collect()
    }

Surprisingly, it doesn't give any performance benefits compared to a simple linear search I had initially, i.e the latency for 300k features is fluctuating around 32-40ms.

I assume I've either screwed something up or overestimated the power of rtree :neutral_face:, please share your thoughts

adamreichold commented 1 week ago

Make sure you are not using rstar 0.12.1 as it has a performance bug (fixed in 0.12.2). Also you are collecting the results twice in your helper function which should not be necessary as you can use map(|f| f.inner) directly on the iterator yielded by cloned.

alipheesa commented 1 week ago

After proper investigation I can confirm that RTree::locate_within_distance works correctly, giving 4ms latency instead of ~40ms observed. The root of confusion was .transform_point_to_epsg3035 utility function that was consuming unexpectedly large amount of resources while reprojecting EPSG:4326 to EPSG:3035. I think we can close the ticket, thanks for help!