First pass at memoizing Leaf envelope calls

urschrei commented 1 year ago

[x] I agree to follow the project's code of conduct.
[ ] I added an entry to rstar/CHANGELOG.md if knowledge of this change could be valuable to users.

adamreichold commented 1 year ago

While this does seem straight forward to implement, doesn't it have the downside that we keep the cached envelopes around instead of throwing them away after building? There is probably no additional memory consumption in many cases if T is smaller than Vec<RTreeNode<T>> so that RTreeNode does not grow, but if T is larger than Vec<RTreeNode<T>> this could lead to higher long-term memory consumption.

(As a general observation unrelated to the question of caching envelopes, I suspect that storing a single Vec<RTreeNode<T>> and have children: Range<usize> in ParentNode, i.e. tree-wide arena allocation of nodes, could have a positive performance impact.)

urschrei commented 1 year ago

This is going to be super-annoying to test because rstar doesn't know what a linestring is. The more I have to actually work with this crate, the more I find that every aspect of its interaction with the geo ecosystem is filled with papercuts at every turn.

We really need to do something here: A fast spatial index is a cornerstone of a massive amount of functionality for geo's algorithms, and every attempt to improve anything is foundering because you first have to re-invent the wheel. I just don't have time to spend two days figuring out how to integrate wkt just so I can dump a few arrays into a data structure.

rmanoka commented 1 year ago

@urschrei Agreed; it's not the best yet. I've added some simple polygon based tests here. Please merge it if good. The bulk loading was actually not using the RTreeNode optimization we've added, and I've fixed that too. It gives a decent 10x boost even on polygons with 64 sides.

Bulk load complex geom  time:   [398.98 µs 400.06 µs 401.10 µs]                                                                             
                        change: [-99.043% -99.040% -99.037%] (p = 0.00 < 0.05)                                                              
                        Performance has improved.                     
Bulk load complex geom  time:   [5.1578 ms 5.1630 ms 5.1721 ms]                                     
                        change: [+1189.8% +1193.3% +1196.6%] (p = 0.00 < 0.05)
                        Performance has regressed.

rmanoka commented 1 year ago

I do agree with @adamreichold concern about space too. This will blow up the memory usage of the rtree for the "usual" point case by about 3x. I'm for merging this though.

urschrei commented 1 year ago

@rmanoka Amazing, thank you!

Regarding memory use, could we specify three RTreeNode members? That would allow us to choose a memoizing (fast bulk inserts, larger memory use) version or the existing (slower inserts, lower memory use). It would mean some extra complexity as we match on the leaf variant and then choose a bulk load function. Very much thinking out loud here, though…

(I'm inclined to merge as-is soon, given the improvement, but happy for more discussion).

adamreichold commented 1 year ago

Regarding memory use, could we specify three RTreeNode members? That would allow us to choose a memoizing (fast bulk inserts, larger memory use) version or the existing (slower inserts, lower memory use). It would mean some extra complexity as we match on the leaf variant and then choose a bulk load function. Very much thinking out loud here, though…

Enums have the size of their largest variants plus space for the tag, so without additional boxing this would not help. Also due to the additional complexity, I think we would fare better just doing the work of wiring in an explicit cache like HashMap<*const T, T::Envelope> which can be summarily dropped after building is done.

urschrei commented 1 year ago

Enums have the size of their largest variants plus space for the tag, so without additional boxing this would not help

Ugh, completely forgot.

I think we would fare better just doing the work of wiring in an explicit cache like HashMap<*const T, T::Envelope> which can be summarily dropped after building is done.

Where would it "live" though?

adamreichold commented 1 year ago

Where would it "live" though?

My first try would be on the stack at the top of bulk_load_sequential and passed as &mut HashMap<*const T, T::Envelope> into bulk_load_recursive and from there threaded through PartitioningTask etc.

adamreichold commented 1 year ago

If my reading of the code is correct, the main change to use an explicit on-stack cache besides threading parameters through would be that Envelope::partition_envelopes needs to access the cache.

Theoretically, we could even pass it remaining: &mut [(T, T::Envelope)] and avoid the hash table entirely at the cost of computing envelopes up front instead of lazily on demand. (I think this would mean that we uselessly compute the envelopes of the leaf nodes. Has anyone any idea how expensive this would be?)

adamreichold commented 1 year ago

So something like the following diff for eagerly computing the envelopes

Full diff

```diff diff --git a/rstar/src/aabb.rs b/rstar/src/aabb.rs index 63f619e..c12bb54 100644 --- a/rstar/src/aabb.rs +++ b/rstar/src/aabb.rs @@ -36,7 +36,7 @@ where pub fn from_point(p: P) -> Self { AABB { lower: p.clone(), - upper: p.clone(), + upper: p, } } @@ -206,14 +206,13 @@ where fn partition_envelopes>( axis: usize, - envelopes: &mut [T], + envelopes: &mut [(T, Self)], selection_size: usize, ) { envelopes.select_nth_unstable_by(selection_size, |l, r| { - l.envelope() - .lower + l.1.lower .nth(axis) - .partial_cmp(&r.envelope().lower.nth(axis)) + .partial_cmp(&r.1.lower.nth(axis)) .unwrap() }); } diff --git a/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs b/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs index 886cf8c..b91f1c9 100644 --- a/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs +++ b/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs @@ -11,7 +11,7 @@ use num_traits::Float; use super::cluster_group_iterator::{calculate_number_of_clusters_on_axis, ClusterGroupIterator}; -fn bulk_load_recursive(elements: Vec, depth: usize) -> ParentNode +fn bulk_load_recursive(elements: Vec<(T, T::Envelope)>, depth: usize) -> ParentNode where T: RTreeObject, ::Point: Point, @@ -20,7 +20,10 @@ where let m = Params::MAX_SIZE; if elements.len() <= m { // Reached leaf level - let elements: Vec<_> = elements.into_iter().map(RTreeNode::Leaf).collect(); + let elements: Vec<_> = elements + .into_iter() + .map(|(element, _envelope)| RTreeNode::Leaf(element)) + .collect(); return ParentNode::new_parent(elements); } let number_of_clusters_on_axis = @@ -43,7 +46,7 @@ where /// A partitioning iterator will take this item from its work queue and start partitioning "elements" /// along "current_axis" . struct PartitioningState { - elements: Vec, + elements: Vec<(T, T::Envelope)>, current_axis: usize, } @@ -89,7 +92,7 @@ impl Iterator for PartitioningTask(elements: Vec) -> ParentNode +pub fn bulk_load_sequential(elements: Vec<(T, T::Envelope)>) -> ParentNode where T: RTreeObject, ::Point: Point, diff --git a/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs b/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs index 2b60838..089610f 100644 --- a/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs +++ b/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs @@ -7,14 +7,14 @@ use num_traits::Float; /// Partitions elements into groups of clusters along a specific axis. pub struct ClusterGroupIterator { - remaining: Vec, + remaining: Vec<(T, T::Envelope)>, slab_size: usize, pub cluster_dimension: usize, } impl ClusterGroupIterator { pub fn new( - elements: Vec, + elements: Vec<(T, T::Envelope)>, number_of_clusters_on_axis: usize, cluster_dimension: usize, ) -> Self { @@ -28,7 +28,7 @@ impl ClusterGroupIterator { } impl Iterator for ClusterGroupIterator { - type Item = Vec; + type Item = Vec<(T, T::Envelope)>; fn next(&mut self) -> Option { match self.remaining.len() { @@ -56,7 +56,7 @@ where // The depth of the resulting tree, assuming all leaf nodes will be filled up to MAX_SIZE let depth = (number_of_elements as f32).log(max_size).ceil() as usize; // The number of elements each subtree will hold - let n_subtree = (max_size as f32).powi(depth as i32 - 1); + let n_subtree = max_size.powi(depth as i32 - 1); // How many clusters will this node contain let number_of_clusters = (number_of_elements as f32 / n_subtree).ceil(); diff --git a/rstar/src/envelope.rs b/rstar/src/envelope.rs index 91978f4..cefa493 100644 --- a/rstar/src/envelope.rs +++ b/rstar/src/envelope.rs @@ -60,7 +60,7 @@ pub trait Envelope: Clone + PartialEq + ::core::fmt::Debug { /// than envelopes[selection_size + 1..]. fn partition_envelopes>( axis: usize, - envelopes: &mut [T], + envelopes: &mut [(T, Self)], selection_size: usize, ); } diff --git a/rstar/src/rtree.rs b/rstar/src/rtree.rs index 167f5de..0eaf4b7 100644 --- a/rstar/src/rtree.rs +++ b/rstar/src/rtree.rs @@ -235,7 +235,7 @@ where /// /// For more information refer to [RTree::bulk_load] /// and [RTreeParams]. - pub fn bulk_load_with_params(elements: Vec) -> Self { + pub fn bulk_load_with_params>(elements: I) -> Self { Self::new_from_bulk_loading(elements, bulk_load::bulk_load_sequential::<_, Params>) } @@ -447,11 +447,18 @@ where &mut self.root } - fn new_from_bulk_loading( - elements: Vec, - root_loader: impl Fn(Vec) -> ParentNode, + fn new_from_bulk_loading>( + elements: I, + root_loader: impl Fn(Vec<(T, T::Envelope)>) -> ParentNode, ) -> Self { verify_parameters::(); + let elements = elements + .into_iter() + .map(|element| { + let envelope = element.envelope(); + (element, envelope) + }) + .collect::>(); let size = elements.len(); let root = if size == 0 { ParentNode::new_root::() ```

adamreichold commented 1 year ago

Doing it lazily using once_cell::unsync::OnceCell also seems reasonably even though I would want to see benchmarks showing that it is actually worth it compared to computing all the envelopes up front which most likely has much better cache locality and could also be trivially parallelized:

Full diff

```diff diff --git a/rstar/Cargo.toml b/rstar/Cargo.toml index 65b235d..9fccd4c 100644 --- a/rstar/Cargo.toml +++ b/rstar/Cargo.toml @@ -18,6 +18,7 @@ maintenance = { status = "actively-developed" } [dependencies] heapless = "0.7.10" num-traits = { version = "0.2", default-features = false, features = ["libm"] } +once_cell = { version = "1.17.1", default-features = false } serde = { version = "1.0", optional = true, default-features = false, features = ["alloc", "derive"] } smallvec = "1.6" diff --git a/rstar/src/aabb.rs b/rstar/src/aabb.rs index 63f619e..8fd73c9 100644 --- a/rstar/src/aabb.rs +++ b/rstar/src/aabb.rs @@ -1,6 +1,7 @@ use crate::point::{max_inline, Point, PointExt}; use crate::{Envelope, RTreeObject}; use num_traits::{Bounded, One, Zero}; +use once_cell::unsync::OnceCell; #[cfg(feature = "serde")] use serde::{Deserialize, Serialize}; @@ -36,7 +37,7 @@ where pub fn from_point(p: P) -> Self { AABB { lower: p.clone(), - upper: p.clone(), + upper: p, } } @@ -206,15 +207,14 @@ where fn partition_envelopes>( axis: usize, - envelopes: &mut [T], + envelopes: &mut [(T, OnceCell)], selection_size: usize, ) { envelopes.select_nth_unstable_by(selection_size, |l, r| { - l.envelope() - .lower - .nth(axis) - .partial_cmp(&r.envelope().lower.nth(axis)) - .unwrap() + let l = l.1.get_or_init(|| l.0.envelope()); + let r = r.1.get_or_init(|| r.0.envelope()); + + l.lower.nth(axis).partial_cmp(&r.lower.nth(axis)).unwrap() }); } } diff --git a/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs b/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs index 886cf8c..371ac7c 100644 --- a/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs +++ b/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs @@ -5,13 +5,17 @@ use crate::params::RTreeParams; use crate::point::Point; use alloc::{vec, vec::Vec}; +use once_cell::unsync::OnceCell; #[allow(unused_imports)] // Import is required when building without std use num_traits::Float; use super::cluster_group_iterator::{calculate_number_of_clusters_on_axis, ClusterGroupIterator}; -fn bulk_load_recursive(elements: Vec, depth: usize) -> ParentNode +fn bulk_load_recursive( + elements: Vec<(T, OnceCell)>, + depth: usize, +) -> ParentNode where T: RTreeObject, ::Point: Point, @@ -20,7 +24,10 @@ where let m = Params::MAX_SIZE; if elements.len() <= m { // Reached leaf level - let elements: Vec<_> = elements.into_iter().map(RTreeNode::Leaf).collect(); + let elements: Vec<_> = elements + .into_iter() + .map(|(element, _envelope)| RTreeNode::Leaf(element)) + .collect(); return ParentNode::new_parent(elements); } let number_of_clusters_on_axis = @@ -43,7 +50,7 @@ where /// A partitioning iterator will take this item from its work queue and start partitioning "elements" /// along "current_axis" . struct PartitioningState { - elements: Vec, + elements: Vec<(T, OnceCell)>, current_axis: usize, } @@ -89,7 +96,7 @@ impl Iterator for PartitioningTask(elements: Vec) -> ParentNode +pub fn bulk_load_sequential(elements: Vec<(T, OnceCell)>) -> ParentNode where T: RTreeObject, ::Point: Point, diff --git a/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs b/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs index 2b60838..a930b47 100644 --- a/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs +++ b/rstar/src/algorithm/bulk_load/cluster_group_iterator.rs @@ -1,20 +1,21 @@ use crate::{Envelope, Point, RTreeObject, RTreeParams}; use alloc::vec::Vec; +use once_cell::unsync::OnceCell; #[allow(unused_imports)] // Import is required when building without std use num_traits::Float; /// Partitions elements into groups of clusters along a specific axis. pub struct ClusterGroupIterator { - remaining: Vec, + remaining: Vec<(T, OnceCell)>, slab_size: usize, pub cluster_dimension: usize, } impl ClusterGroupIterator { pub fn new( - elements: Vec, + elements: Vec<(T, OnceCell)>, number_of_clusters_on_axis: usize, cluster_dimension: usize, ) -> Self { @@ -28,7 +29,7 @@ impl ClusterGroupIterator { } impl Iterator for ClusterGroupIterator { - type Item = Vec; + type Item = Vec<(T, OnceCell)>; fn next(&mut self) -> Option { match self.remaining.len() { @@ -56,7 +57,7 @@ where // The depth of the resulting tree, assuming all leaf nodes will be filled up to MAX_SIZE let depth = (number_of_elements as f32).log(max_size).ceil() as usize; // The number of elements each subtree will hold - let n_subtree = (max_size as f32).powi(depth as i32 - 1); + let n_subtree = max_size.powi(depth as i32 - 1); // How many clusters will this node contain let number_of_clusters = (number_of_elements as f32 / n_subtree).ceil(); diff --git a/rstar/src/envelope.rs b/rstar/src/envelope.rs index 91978f4..df5e493 100644 --- a/rstar/src/envelope.rs +++ b/rstar/src/envelope.rs @@ -1,4 +1,5 @@ use crate::{Point, RTreeObject}; +use once_cell::unsync::OnceCell; /// An envelope type that encompasses some child nodes. /// @@ -60,7 +61,7 @@ pub trait Envelope: Clone + PartialEq + ::core::fmt::Debug { /// than envelopes[selection_size + 1..]. fn partition_envelopes>( axis: usize, - envelopes: &mut [T], + envelopes: &mut [(T, OnceCell)], selection_size: usize, ); } diff --git a/rstar/src/rtree.rs b/rstar/src/rtree.rs index 167f5de..651b8b4 100644 --- a/rstar/src/rtree.rs +++ b/rstar/src/rtree.rs @@ -14,6 +14,7 @@ use crate::params::{verify_parameters, DefaultParams, InsertionStrategy, RTreePa use crate::Point; use alloc::vec::Vec; +use once_cell::unsync::OnceCell; #[cfg(feature = "serde")] use serde::{Deserialize, Serialize}; @@ -235,7 +236,7 @@ where /// /// For more information refer to [RTree::bulk_load] /// and [RTreeParams]. - pub fn bulk_load_with_params(elements: Vec) -> Self { + pub fn bulk_load_with_params>(elements: I) -> Self { Self::new_from_bulk_loading(elements, bulk_load::bulk_load_sequential::<_, Params>) } @@ -447,11 +448,15 @@ where &mut self.root } - fn new_from_bulk_loading( - elements: Vec, - root_loader: impl Fn(Vec) -> ParentNode, + fn new_from_bulk_loading>( + elements: I, + root_loader: impl Fn(Vec<(T, OnceCell)>) -> ParentNode, ) -> Self { verify_parameters::(); + let elements = elements + .into_iter() + .map(|element| (element, OnceCell::new())) + .collect::>(); let size = elements.len(); let root = if size == 0 { ParentNode::new_root::() ```

adamreichold commented 1 year ago

I guess the call to envelope_for_children in ParentNode::new_parent should also make use of the cached envelopes if out-of-line storage is used. I think this suggests that the lazy variant would not be useful after all, since this would access exactly those envelopes which might not have been accessed during selection.

urschrei commented 1 year ago

I guess the call to envelope_for_children in ParentNode::new_parent should also make use of the cached envelopes if out-of-line storage is used. I think this suggests that the lazy variant would not be useful after all, since this would access exactly those envelopes which might not have been accessed during selection.

~~I've just finished applying your first diff, and the timing vs @rmanoka's changes (baseline is 873.54 µs) is not encouraging so far:~~

~~Bulk load complex geom time: [148.10 ms 148.51 ms 148.91 ms] change: [+15882% +16210% +16515%] (p = 0.00 < 0.05) Performance has regressed.~~

See updates below: this wasn't a valid comparison due to tree size differences.

I've opened a draft PR with your initial suggested changes and the new benchmark: https://github.com/georust/rstar/pull/117

If you check out that branch or PR you can push directly to it – I'm not sure where you'd like the new_parent change to go.

adamreichold commented 1 year ago

I've just finished applying your first diff, and the timing vs @rmanoka's changes (baseline is 873.54 µs) is not encouraging so far:

Note that the branch you pushed uses the original size of 4096 whereas this branch here uses 64. If I uses 4069 in both cases, I get

Bulk load complex geom  time:   [27.620 ms 27.653 ms 27.689 ms]

for this branch and the eager out-of-line computation

Bulk load complex geom  time:   [36.660 ms 36.819 ms 37.012 ms]

which is still worse, but more reasonably close.

urschrei commented 1 year ago

I was just wondering whether there was an obvious blunder somewhere, given the magnitude of the difference.

adamreichold commented 1 year ago

If you check out that branch or PR you can push directly to it – I'm not sure where you'd like the new_parent change to go.

I don't think I can push there because I cannot push to this repository at all, but it isn't really necessary I guess.

The diff for using the pre-computed envelopes when combing the leaf nodes into a parent node is just

diff --git a/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs b/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs
index 6b9aa63..bc7fbcb 100644
--- a/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs
+++ b/rstar/src/algorithm/bulk_load/bulk_load_sequential.rs
@@ -20,11 +20,20 @@ where
     let m = Params::MAX_SIZE;
     if elements.len() <= m {
         // Reached leaf level
-        let elements: Vec<_> = elements
+        let envelope = elements.iter().fold(
+            T::Envelope::new_empty(),
+            |mut envelope, (_element, envelope1)| {
+                envelope.merge(envelope1);
+                envelope
+            },
+        );
+
+        let children: Vec<_> = elements
             .into_iter()
             .map(|(element, _envelope)| RTreeNode::Leaf(element))
             .collect();
-        return ParentNode::new_parent(elements);
+
+        return ParentNode { children, envelope };
     }
     let number_of_clusters_on_axis =
         calculate_number_of_clusters_on_axis::<T, Params>(elements.len());

~~which for me brings the benchmark down to~~

Bulk load complex geom  time:   [23.106 ms 23.137 ms 23.177 ms]

~~which looks like the best result so far.~~

EDIT: Forgot to disable frequency boost, see below for the correct results.

(The other call to ParentNode::new_parent does not use the pre-computed envelopes, but I don't think this is an issue as it will produce parents-of-parents and those already internally "cache" their envelope.)

adamreichold commented 1 year ago

I am sorry but the numbers above are also incorrect as I only switched the CPU frequency governor but forgot to disable frequency boost. Doing that yields

Bulk load complex geom  time:   [28.556 ms 28.605 ms 28.659 ms]

using the eager computation for me which is slightly worse than the approach presented here, but comes without the indefinite increase in memory usage.

urschrei commented 1 year ago

With the latest changes in #117 I'm now getting

Bulk load complex geom  time:   [121.15 ms 121.52 ms 121.92 ms]
                        change: [-0.2388% +0.2131% +0.6614%] (p = 0.35 > 0.05)
                        No change in performance detected.

Which is very encouraging.

adamreichold commented 1 year ago

Note that

diff --git a/rstar-benches/benches/benchmarks.rs b/rstar-benches/benches/benchmarks.rs
index 7de1601..5a3ec16 100644
--- a/rstar-benches/benches/benchmarks.rs
+++ b/rstar-benches/benches/benchmarks.rs
@@ -58,7 +58,7 @@ fn bulk_load_complex_geom(c: &mut Criterion) {
         let polys: Vec<_> = create_random_polygons(DEFAULT_BENCHMARK_TREE_SIZE, 4096, SEED_1);

         b.iter(|| {
-            RTree::<Polygon<f64>, Params>::bulk_load_with_params(polys.clone());
+            RTree::<Polygon<f64>, Params>::bulk_load_with_params(polys.iter().cloned());
         });
     });
 }

reduces the runtime further for me to

Bulk load complex geom  time:   [24.518 ms 24.567 ms 24.625 ms]

by avoiding to create the intermediate Vec.

But of course, downstream code would need to make the same changes to benefit if we go for the IntoIterator-based API.

urschrei commented 1 year ago

OK, we're currently still neck-and-neck on the new bulk load benchmark, but with lower memory usage in #117.

urschrei commented 1 year ago

Here's the full suite: #117 vs master

     Running benches/benchmarks.rs (target/release/deps/benchmarks-f2d29554c7e4f7dd)
Bulk load baseline      time:   [172.08 µs 173.89 µs 176.19 µs]
                        change: [+6.0470% +6.9826% +7.8936%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

rstar and spade benchmarks/rstar sequential
                        time:   [1.8192 ms 2.0567 ms 2.3794 ms]
                        change: [+17.971% +27.835% +41.962%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 11 outliers among 100 measurements (11.00%)
  5 (5.00%) high mild
  6 (6.00%) high severe

Bulk load complex geom  time:   [119.35 ms 120.23 ms 121.23 ms]
                        change: [-82.387% -82.243% -82.102%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 3 outliers among 100 measurements (3.00%)
  2 (2.00%) high mild
  1 (1.00%) high severe

bulk load quality       time:   [169.08 µs 170.87 µs 173.31 µs]
                        change: [-15.622% -8.0801% -1.2220%] (p = 0.03 < 0.05)
                        Performance has improved.
Found 4 outliers among 100 measurements (4.00%)
  1 (1.00%) high mild
  3 (3.00%) high severe

sequential load quality time:   [219.94 µs 221.89 µs 223.86 µs]
                        change: [-6.5737% -0.3851% +4.7198%] (p = 0.91 > 0.05)
                        No change in performance detected.
Found 1 outliers among 100 measurements (1.00%)
  1 (1.00%) high mild

locate_at_point (successful)
                        time:   [341.55 ns 346.39 ns 353.21 ns]
                        change: [-2.8073% -1.4548% -0.1566%] (p = 0.03 < 0.05)
                        Change within noise threshold.
Found 4 outliers among 100 measurements (4.00%)
  2 (2.00%) high mild
  2 (2.00%) high severe

locate_at_point (unsuccessful)
                        time:   [450.39 ns 471.72 ns 506.93 ns]
                        change: [+3.1001% +6.1859% +9.5738%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 12 outliers among 100 measurements (12.00%)
  5 (5.00%) high mild
  7 (7.00%) high severe

adamreichold commented 1 year ago

Here's the full suite: https://github.com/georust/rstar/pull/117 vs master

Not sure if it explains all of the regression, but I think "Bulk load baseline" would also benefit from replacing points.clone() by points.iter().cloned(). For anything but "Bulk load complex geom", I do not understand how the changes from #117 would improve or regress performance...

rmanoka commented 1 year ago

I prefer this PR to #117 because this will also ensure memoizing during query and not just bulk insertion. Isn't that a better approach in general for the r-tree? I suppose complex geoms in a r-tree is a fairly typical use-case too. For instance, we do this in the geo crate.

I wonder if we can add associated types to RTreeObject trait to handle the memory problem. So we add default cache types that default to envelope type, but for point types, we could just make the envelope type as ()

Update: Looks like the associated types approach is not very ergonomic. For instance, assoc. types defaults are still unstable, so it's a large breaking change to do this.

rmanoka commented 1 year ago

Coming to think of it, I also wouldn't mind simply informing the user very clearly that the r-tree is not even looking at anything other than the envelope of the geometry, and it is probably useless to store complex geoms in the r-tree except as just data. They can use GeomWithData<AABB, GeomType> where it is clear that the actual geometry is just a payload, and the r-tree just handles the envelope.

adamreichold commented 1 year ago

I prefer this PR to https://github.com/georust/rstar/pull/117 because this will also ensure memoizing during query and not just bulk insertion. Isn't that a better approach in general for the r-tree?

Also though about this, but I think the important envelopes (those of the parents) are already cached during querying in the current design. The problem really is bulk insertion where they are not available as we do not know which elements will becomes parents or leafs yet.

If the user really wants the envelope to be cached all the time, they can already do that inside of their type implementing RTreeObject. The bulk insertion performance without caching was just a much larger performance foot gun than querying due to the above.

Coming to think of it, I also wouldn't mind simply informing the user very clearly that the r-tree is not even looking at anything other than the envelope of the geometry, and it is probably useless to store complex geoms in the r-tree except as just data. They can use GeomWithData<AABB, GeomType> where it is clear that the actual geometry is just a payload, and the r-tree just handles the envelope.

I think this is slightly too simplified: For parents we really look only at envelopes (as they obviously have no other geometry of their own) and their it is also cached. But for the leafs, some operations do look at the individual geometry, like the nearest neighbour search via the PointDistance trait, e.g.

https://github.com/georust/rstar/blob/101773a596f5e6827466772f8cce883d8067fe2a/rstar/src/algorithm/nearest_neighbor.rs#L66-L69

If we really want envelopes to be cached all the time, but without any memory overhead if it is not necessary, I think we could also change the RTreeObject trait from

fn envelope(&self) -> Self::Envelope;

to

fn envelope(&self) -> &Self::Envelope;

which is basically impossible to implement without caching, but trivial to implement if the geometry type basically is AABB.

But as written above, I find the current trade-off (caching only parent envelopes and leaving leaf envelopes to the implementor) preferable if we fix the bulk insertion performance foot gun, i.e. I would argue for #117.

rmanoka commented 1 year ago

Note that both the PRs add a 5-10% regression to the points case. I'm thinking we should aid with ergo. wrappers like GeomWithData that cache explicitly rather than incur unnecessary extra regression for the most typical cases.

For larger redesign, I quite like the returning ref. idea:

fn envelope(&self) -> &Self::Envelope;

Definitely worth evaluating this more (separately).

adamreichold commented 1 year ago

So something like

struct WithCachedEnvelope<O: RTreeObject> {
  object: O,
  envelope: O::Envelope,
}

impl<O: RTreeObject> WithCachedEnvelope {
  fn new(object: O) -> Self {
    let envelope = object.envelope();

    Self { object, envelope }
}

impl<O: RTreeObject> RTreeObject for WithCachedEnvelope where O::Envelope: Clone {
  type Envelope = O::Envelope;

  fn envelope(&self) -> Self::Envelope {
    self.envelope.clone()
  }
}

? I think this would be nicely composeable and would support such a change as well.

I am somewhat surprised by the statement

for the most typical cases.

though. Wouldn't one usually use the simpler k-d trees for point-like data and turn to R* trees only for more complicated geometry?

urschrei commented 1 year ago

I am somewhat surprised by the statement

for the most typical cases. though. Wouldn't one usually use the simpler k-d trees for point-like data and turn to R* trees only for more complicated geometry?

My experience (which may not be universal) is that in GIS / most OGC-adjacent geo applications there isn't much distinction made between types of spatial index; there's just the spatial index library that the ecosystem provides, plus PostGIS (which I believe allows more granular control over indices, but in general isn't customised much).

adamreichold commented 1 year ago

My experience (which may not be universal) is that in GIS / most OGC-adjacent geo applications there isn't much distinction made between types of spatial index; there's just the spatial index library that the ecosystem provides, plus PostGIS (which I believe allows more granular control over indices, but in general isn't customised much).

Thank you for explanation, this does explain my surprise: My initial contact with spatial indexes was for simulations where I think it is not untypical to use different kinds of indexes for different kinds of geometry and/or different computational tasks. I guess being part of the georust organisation, providing a robust default choice is indeed a high priority requirement for this crate.

urschrei commented 1 year ago

Yes, but I'm also conscious of the fact that our use case isn't a monopoly or even necessarily the most important. We maintain the crate, but I don't want to privilege our use cases over others, so I think we're trying to figure out how to accommodate everyone while balancing the relatively small amount of dev resources we have.

adamreichold commented 1 year ago

Closing since we went for #118 instead.

@urschrei Could you delete branch if you do not need it anymore?

georust / rstar

First pass at memoizing Leaf envelope calls #116

[ ] I added an entry to `rstar/CHANGELOG.md` if knowledge of this change could be valuable to users.

georust / rstar

First pass at memoizing Leaf envelope calls #116

[ ] I added an entry to rstar/CHANGELOG.md if knowledge of this change could be valuable to users.

[ ] I added an entry to `rstar/CHANGELOG.md` if knowledge of this change could be valuable to users.