georust / rstar

R*-tree spatial index for the Rust ecosystem
https://docs.rs/rstar
Apache License 2.0
386 stars 67 forks source link

feat: dynamic rather than static parameters #105

Open msalib opened 1 year ago

msalib commented 1 year ago

This change produces a 3%-15% improvement on benchmarks.

Along the way, I've moved some checks into Params::new or into a static assert (in the case of the point dimension greater than 1 check).

Although I made this PR to provide more flexibility to clients and although it has improved benchmarks, I think this change has made the API simpler as well. Params is just a single struct with a constructor. You don't have to deal with weird type magic. RTree<T> has a single generic parameter, T. In earlier versions of this PR, I made the insertion strategy another generic parameter for RTree with a default value, but then dropped it because it cluttered up the code significantly; this crate is almost 5 years old and has only ever had one insertion strategy, so I didn't see the benefit in adding another generic parameter to support genericity that we haven't needed and are unlikely to need in the future.

Tests and benchmarks pass but I still need to clean up the docs.

Andlon commented 1 year ago

Some comments from an rstar user who's not so familiar with the rstar code base itself:

From a user's perspective, I think this is probably a good idea. It reduces "needless" generics and therefore overall complexity of the API, and it seems easier to work with and enables changing settings at runtime.

What's the reason for the improvement in the benchmarks? I'd have expected static parameters not to be slower than runtime parameters...

I'm concerned about the Params::new(usize, usize, usize) signature, as it's easy to get the order of parameters mixed up. What about a builder-like pattern instead? Perhaps without a separate builder? This has the added benefit that you can add new parameters in the future without a breaking change as long as you provide a default. Example:

let params = Params::default() //could also be Params::new()
    .with_max_size(my_max_size)
    .with_reinsertion_count(2);
// we omit .with_min_size which leaves it the default
Andlon commented 1 year ago

Oh, and, perhaps you consider not implementing Copy for Params, since you might want to add non-copy settings in the future. The amount of times you need to explicitly .clone() such a struct should in any case be very small as I expect you take it by value in the construction and only reference from then on (?).

kylebarron commented 1 year ago

this crate is almost 5 years old and has only ever had one insertion strategy, so I didn't see the benefit in adding another generic parameter to support genericity that we haven't needed and are unlikely to need in the future.

I haven't looked at the code changes yet; does this PR remove the insertion strategy altogether? I had been thinking I might try to dabble with an implementation of STR insertion strategy for static trees for data partitioning.

rmanoka commented 1 year ago

This PR currently addresses two concerns: (1) supporting dynamic params; (2) simplifying the RTree type generics. I think the former is definitely a useful addition, but am not convinced about the latter. May I suggest the following alternative to only support (1) and discuss (2) in a separate PR?

Reg. dynamic params, the current trait does indeed not support it, and it was probably designed to statically verify the fact that the sizes in the params do not change. Instead, we could change it to:

pub trait RTreeParams: Send + Sync {
    fn min_size(&self) -> usize;
    fn max_size(&self) -> usize;
    fn reinsertion_count(&self) -> usize;

    type DefaultInsertionStrategy: InsertionStrategy;
}

We can then implement this trait on the Params struct defined in the PR. Now of course, the static guarantee is lost, and we need to emphasize that the min_size, etc. should not change across calls on the same object. This anyway can't be done unless we use interior mutability or just produce some random numbers in the function call.

Or we could make the function into something like fn min_size(&self) -> &usize to ensure it is a stored reference that is not being computed per call.

This way, we maintain the current generics and we can decide on dropping that independently.

@msalib would you be able to try this in this PR?