Open jasonwhite opened 11 months ago
I think that using unsigned numbers would mean that we would have to audit all algorithms to not produce underflow at zero and thereby produce nonsensical results (or panic in debug builds). And even this check would most likely imply the introduction of new failure modes into the API or at least document new possibilities of panics.
Personally, I would recommend to use a signed type in the tree and convert your coordinates when interface with the rest of your code. If you are confident, you can also effectively circumvent the requirement without changes to the rstar
source by doing something like:
impl Signed for MyFancyNumberType {
fn abs(&self) -> Self { unreachable!() }
fn abs_sub(&self, other: &Self) -> Self { unreachable!() }
fn signum(&self) -> Self { unreachable!() }
fn is_positive(&self) -> bool { unreachable!() }
fn is_negative(&self) -> bool { unreachable!() }
}
I think that using unsigned numbers would mean that we would have to audit all algorithms to not produce underflow at zero and thereby produce nonsensical results (or panic in debug builds).
I went down this rabbit hole yesterday. So far, I've found that the PointExt::distance_2
calculation can underflow in its sub
calculation. I was able to partially work around that by implementing it this way instead:
pub trait PointExt: Point {
fn abs_diff(&self, other: &Self) -> Self {
self.component_wise(other, |l, r| {
if l < r {
r - l
} else {
l - r
}
})
}
fn distance_2(&self, other: &Self) -> Self::Scalar {
self.abs_diff(other).length_2()
}
}
In an optimized build, abs_diff
boils down to about 4 x86-64 instructions by using a conditional move. Not great nor necessary for signed numbers. Stabilization of specialization in Rust would be nice to avoid this overhead.
However, PointExt::length_2()
can now overflow when used with large integers, because it is implemented like this:
fn length_2(&self) -> Self::Scalar {
self.fold(Zero::zero(), |acc, cur| cur * cur + acc)
}
I think it would be correct-ish to implement this in terms of saturating_mul
and saturating_add
, but unfortunately num_traits::{SaturatingMul, SaturatingAdd}
are not implemented for f32
or f64
as I believe floats are already saturating at "infinity".
It might be good to use more saturating/wrapping operations in these calculations, as it would extend the range of signed numbers that can be used with this library.
Personally, I would recommend to use a signed type in the tree and convert your coordinates when interface with the rest of your code.
I think this is what I'll end up doing (or implement my own specialized version of R-tree). For my project, I will likely have a rather huge spatial index that would benefit from being memory mapped, something that this library cannot provide yet, so I might go down the latter path.
I tried converting my u64
s to i64
s using:
fn u64_to_i64(x: u64) -> i64 {
// Simply flip the sign bit. This is equivalent to subtracting 2^63.
(x ^ (i64::MIN as u64)) as i64
}
But I still run into overflow problems. My rectangles can be very large, or even take up the entire i64
/ u64
range.
First, I ran into an overflow in <AABB as Envelope>::center()
, which is calculated like this:
|x, y| (x + y) / two
Overflows can be avoided by calculating it like this instead:
|x, y| x / two + y / two
(It's just one additional shr
instruction.)
Next, I am still running into the length_2
overflow mentioned above. Not sure how to solve that one without doing saturating arithmetic operations. Even if I calculated the square root, it'd still overflow before getting there.
Thanks @jasonwhite for the in-depth changeset required for this; reg. the distance_2
changes: could you tell us how much the benchmarks regress after this change? About length_2
, I think it's fair if the data-structure works correctly on geoms with points whose pairwise distance^2 are within the scalar range.
I am for merging a PR to support unsigned types as long it's a not a huge regression.
I'll likely end up going with a quadtree instead. My use-case doesn't require the need to find the nearest neighbors. I only care about querying for which rectangles overlap with a given input.
Feel free to close this issue, but I suspect others may be interested in having support for unsigned numbers (provided that they are small enough to avoid overflow).
FYI, I'm not able to run the benchmarks because of a problem compiling the geo
crate.
error[E0275]: overflow evaluating the requirement `[closure@/home/jasonwhite/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-0.24.1/src/algorithm/map_coords.rs:855:69: 855:72]: Fn<(geo_types::Coord<T>,)>`
|
= help: consider increasing the recursion limit by adding a `#![recursion_limit = "256"]` attribute to your crate (`geo`)
= note: required for `&[closure@/home/jasonwhite/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-0.24.1/src/algorithm/map_coords.rs:855:69: 855:72]` to implement `Fn<(geo_types::Coord<T>,)>`
= note: 128 redundant requirements hidden
= note: required for `&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&&[closure@/home/jasonwhite/.cargo/registry/src/index.crates.io-6f17d22bba15001f/geo-0.24.1/src/algorithm/map_coords.rs:855:69: 855:72]` to implement `Fn<(geo_types::Coord<T>,)>`
I'm not able to run the benchmarks
I've just now opened https://github.com/georust/rstar/pull/131 which should address that error.
reg. the distance_2 changes: could you tell us how much the benchmarks regress after this change?
If fear the change is unpalatable in the general case. let d = x - y; d * d
is two vectorizable instructions whereas let d = if x > y { x - y } else { y - x}; d * d
is 11 instruction containing a branch if the compiler decides against conditional moves.
I think that generally, this operation significantly profits from using signed types. As an alternative, we could have a helper trait like SquaredDiff: Num + PartialOrd
which uses the unsigned-safe implementation as a default for is overridden for concrete types like i64
and f64
which do not need that? However, it does seem strange for that to live inside of rstar
instead of e.g. num-traits
.
I'm a bit confused why overflow should occur at all. I took a long time to painfully implement unsigned ints. Now the unsigned ints are overflowing. Why does RStar subtract from unsigned ints at all?
Error:
attempt to subtract with overflow
stack backtrace:
0: rust_begin_unwind
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/std/src/panicking.rs:645:5
1: core::panicking::panic_fmt
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:72:14
2: core::panicking::panic
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/panicking.rs:144:5
3: <usize as core::ops::arith::Sub>::sub
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/ops/arith.rs:212:45
4: <xl::unsigned_rstar::RStarInt as core::ops::arith::Sub>::sub
at ./src/unsigned_rstar.rs:10:97
5: rstar::point::PointExt::sub::{{closure}}
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:259:43
6: rstar::point::PointExt::component_wise::{{closure}}
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:197:28
7: <[S; N] as rstar::point::Point>::generate::{{closure}}
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:320:23
8: core::ops::try_trait::NeverShortCircuit<T>::wrap_mut_1::{{closure}}
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/ops/try_trait.rs:385:36
9: <core::iter::adapters::map::Map<I,F> as core::iter::traits::unchecked_iterator::UncheckedIterator>::next_unchecked
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/iter/adapters/map.rs:203:9
10: core::array::try_from_trusted_iterator::next::{{closure}}
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:791:22
11: core::array::try_from_fn_erased
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:822:20
12: core::array::try_from_fn
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:104:11
13: core::array::try_from_trusted_iterator
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:795:5
14: core::array::<impl [T; N]>::try_map::{{closure}}
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:538:39
15: core::array::drain::drain_array_with
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/drain.rs:26:5
16: core::array::<impl [T; N]>::try_map
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:538:9
17: core::array::<impl [T; N]>::map
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/array/mod.rs:501:14
18: <[S; N] as rstar::point::Point>::generate
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:319:9
19: rstar::point::PointExt::component_wise
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:197:9
20: rstar::point::PointExt::sub
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/point.rs:259:9
21: rstar::algorithm::rstar::get_nodes_for_reinsertion::{{closure}}
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:341:9
22: alloc::slice::<impl [T]>::sort_by::{{closure}}
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/alloc/src/slice.rs:267:34
23: core::slice::sort::insert_tail
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/slice/sort.rs:52:12
24: core::slice::sort::insertion_sort_shift_left
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/slice/sort.rs:163:13
25: core::slice::sort::merge_sort
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/core/src/slice/sort.rs:1059:13
26: alloc::slice::stable_sort
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/alloc/src/slice.rs:887:5
27: alloc::slice::<impl [T]>::sort_by
at /rustc/e51e98dde6a60637b6a71b8105245b629ac3fe77/library/alloc/src/slice.rs:267:9
28: rstar::algorithm::rstar::get_nodes_for_reinsertion
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:338:5
29: rstar::algorithm::rstar::resolve_overflow
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:244:37
30: rstar::algorithm::rstar::recursive_insert
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:130:16
31: <rstar::algorithm::rstar::RStarInsertionStrategy as rstar::params::InsertionStrategy>::insert
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/algorithm/rstar.rs:43:21
32: rstar::rtree::RTree<T,Params>::insert
at /Users/username/.cargo/registry/src/index.crates.io-6f17d22bba15001f/rstar-0.11.0/src/rtree.rs:821:9
33: xl::helper::RTreeFastValue<_,V>::insert
at ./src/helper.rs:297:9
Code:
use std::ops::Deref;
#[derive(Clone, Copy, PartialEq, PartialOrd, Eq, Ord, Debug, num_derive::Zero, num_derive::One, num_derive::NumOps, num_derive::Num)]
pub struct RStarInt(pub usize);
impl Deref for RStarInt {
type Target = usize;
fn deref(&self) -> &Self::Target {
&self.0
}
}
impl std::ops::Neg for RStarInt {
type Output = RStarInt;
fn neg(self) -> Self {
unreachable!()
}
}
impl num_traits::Signed for RStarInt {
fn abs(&self) -> Self {
*self
}
fn abs_sub(&self, other: &Self) -> Self { unreachable!() }
fn signum(&self) -> Self {
if self.0 == 0 {
RStarInt(0)
} else {
RStarInt(1)
}
}
fn is_positive(&self) -> bool {
true
}
fn is_negative(&self) -> bool {
false
}
}
impl num_traits::Bounded for RStarInt {
fn min_value() -> Self {
RStarInt(0)
}
fn max_value() -> Self {
RStarInt(usize::MAX)
}
}
impl std::hash::Hash for RStarInt {
fn hash<H: std::hash::Hasher>(&self, state: &mut H) {
self.0.hash(state);
}
}
Usually for computing distances. Which basically applies here as well, i.e. your bracktrace comes from
node.children.sort_by(|l, r| {
let l_center = l.envelope().center();
let r_center = r.envelope().center();
l_center
.sub(¢er)
.length_2()
.partial_cmp(&(r_center.sub(¢er)).length_2())
.unwrap()
});
which implements the R* insertion strategy by splitting a node along its center by sorting its children based on lexicographical distance to it.
While evaluating this crate to see if I can use it for a project, I bumped into the issue that
RTreeNum
requiresSigned
. I was hoping to use this data structure with unsigned integers (not floats). I'm not aware of any fundamental reason why an R-tree can't be used with unsigned numbers. I was able to remove the requirement locally and all the tests still pass. Is there something I'm missing or is it just a historical artifact? Thanks!