davidmoten / rtree

Immutable in-memory R-tree and R*-tree implementations in Java with reactive api
Apache License 2.0
1.09k stars 210 forks source link

Is there a way to have sampled results over a geographical search ? #90

Open CyprienGottstein opened 5 years ago

CyprienGottstein commented 5 years ago

Hello,

As said in the title, I would like to know if you have sampled results over a geographical (circle in my case) search.

When I perform geographical search, I have cases where i will have 500k to potentially millions of objets. I believe I don't need all of those to do my stuff, a sample should do the trick.

For this to work, the sample needs to be distributed (uniformly in an utopic universe, semi-uniformly in a realistic one) over the space covered by the circle search.

Is this possible ? I didn't dug too deep into the backpressure code and such before coming here. Since it looks a bit complex I'd like to know if its even possible before foolishly loosing time :)

Thank you very much for you work, very nice job.

Best regards.

davidmoten commented 5 years ago

Interesting question. So the R-tree is is an index based on rectangles. You can infer the tree structure by looking at the R-tree split diagram in README.md. Every non-leaf node has up to maxChildren children. To sample 1000 from a non-leaf node you might concatenate samples of 1000/numChildren from each child, obtained recursively. If you want to play with the effectiveness of this then use https://github.com/davidmoten/rtree2 which is a simplification of rtree (without a reactive api).

davidmoten commented 5 years ago

Note that you can traverse the internal structure via the RTree.root() method which returns a Node. Every Node is either a NonLeaf or a Leaf and every node has an mbr() (minimum bounding rectangle). Thus you can make your own experiments still using public API.

plokhotnyuk commented 5 years ago

@CyprienGottstein a couple years ago we have used this library for the big index (up to 1M) of geo circles. The main idea and formulas are described here.

P.S. I suppose that you can use a spherical model of the Earth with the Mean radius and Haversine formula allow to get ±0.3% accuracy in calculation of distances comparing with Vincenty’s formulae on an oblate spheroid model. But, please doube-check if it is acceptable for you...