Closed urschrei closed 6 years ago
Thanks for the input! It's great to hear that spade has found usage in another project! Historically, I've created the r-tree first, then noticed that a Delaunay Triangulation fits my own purposes way better and focused on the development of the triangulation. That's why I have not yet tried to fully optimize the R-tree. However, R-Trees are intrinsically slow compared to many other algorithms as they often require many random memory accesses, thus I'm not sure on how fast I can make them.
Here are some ideas that might help you in your scenario:
RTree::insert_list(Vec<Line>)
could probably be sped up significantly, I'll have to think through that. If I come up with a good idea, I'll let you know. If you're still interested in helping out, this might be a good opportunity.I've tried out the z-order approach which was surprisingly easy thanks to the morton crate. It does increase performance significantly, but only for a very large number of points. In my test cases, only after inserting about ~50000 points it turned out to be faster. So, to address your problem, assuming that you have around 9000k line segments that are inserted into the tree, the presorted insertion might not yield any speedup. But since I want to implement it anyway, you can easily give it a try once it is there. Additionally, all my tests and optimizations so far were designed to improve the insertion of points. There might be some optimizations that only apply when inserting lines, I'll keep this issue updated once I know more.
Thank you for working on this! We actually have two specific use cases (one current, one will land when I finish up the work and will be far more widely used) for R* trees in rust-geo:
PartialEq
, for SimpleEdge
, I should be able to remove some extra calls to remove edges, which will definitely give some speedup. The edges that are initially inserted are sorted (they're polygon ring segments), but I wasn't aware of z-order sorting.Polygon A
into a tree, finding the closest segment for all points in Polygon B
, then doing the opposite, keeping the smallest point-segment distance of the two runs as the minimum polygon distance. It's a brute-force approach, but the R* tree cuts down the processing time a lot. I know Rbush uses a combination of OMT and Floyd-Rivest selection for point insertion, but I don't know whether those approaches are useful for segments.I should probably try to run some detailed benchmarks against Rbush, and also libspatialindex (which also has a bulk-insertion mode), but the last time I checked, both were around an order of magnitude faster (excepting the possibility that my Rust code was really inefficient in some other way, but I was pretty careful)
I've tried to compare spade with libspatialindex, so far spade seemed to outperform libspatialindex. However, maybe I'm not using libspatialindex correctly, or its parameters are not set up ideally. Maybe my test dataset (randomly generated, sometimes intersecting lines) was not chosen carefully enough, so this comparison should be taken with a grain of salt. I didn't manage to create a comparison with RBush yet, but given that they use specialised algorithms for insertion I believe it should be faster. I didn't read the papers about these algorithms yet, that'll be the next step.
I've implemented the OMT-thingy, you'll find it on the master branch.
Try it out with RTree::bulk_load(vec_with_data)
;
This is only bulk-loading, bulk insertion (= insert many elements into a non empty tree) is still an issue. Still, you should see a significant speedup in your second scenario.
Amazing, I'll try it out over the weekend. Thank you!
I need to break down the exact speedup, but using bulk_load
reduces the time on my benchmark by around 55%. This is great!
I have compared the performance of spade with rbush (running locally on the same machine). Spade ran sequential insertions about 1.4 times faster than rbush, bulk loading ran about 2.7 times faster. Just make sure that you have rust's optimizations enabled, they make a huge difference: The optimizations speed up the sequential insertion about 150 times, bulk load is improved about 190 times. I think I'm done optimizing sequential insertion and bulk loading for now, I'll create a new issue for implementing bulks insertion - the available algorithms seem to be either suboptimal or more complex and can be covered there more in detail.
I'm wondering whether you've benchmarked insertion performance when populating the R* Tree; We're currently using Spade in a couple of places in rust-geo (it's working perfectly, thank you!), and our primary applications involve one-time bulk-inserting line segments into the tree. It's hard to know without comparing to other libraries, but performance seems slower than expected; I see times around 90 ms for inserting segments built from ~9k points. If you have any ideas for improving this, or ideas for improving insertion performance in the library, I'd be interested in helping out with development.