georust / rstar

R*-tree spatial index for the Rust ecosystem
https://docs.rs/rstar
Apache License 2.0
384 stars 67 forks source link

Defend against numerical instability when computing number of clusters #166

Closed adamreichold closed 2 months ago

adamreichold commented 2 months ago

Apparently, it can happen that n.log(k) yields j+eps even though k^j=n exactly. After the call to ceil, this ends up with depth being one too large so that the number of clusters can also end up as one in which case the bulk-loading algorithm enters infinite recursion eventually overflowing the stack.

This change defends against this by ensuring that we always build at least two clusters per recursion step thereby actually reducing the number of objects the next steps has to consider.

adamreichold commented 2 months ago

I have no idea how you found this, but: great!

Every seventh or so rebuild of the current per-segment spatial indexes at https://md.umwelt.info led to a stack overflow starting a week ago. We first thought it was due to Rayon's work stealing, background threads and a continuously growing number of documents. But debugging turned out that this is due to 216.log(6) == 3.000 .. 04 triggering a number of clusters of one. So basically, bad luck...

adamreichold commented 2 months ago

So basically, bad luck...

Actually, now that I think of it, the trigger was probably an upgrade of the base OS to Ubuntu 24.04 which includes a newer glibc and thereby most likely updated transcendental function approximations like log.