georust / rstar

R*-tree spatial index for the Rust ecosystem
https://docs.rs/rstar
Apache License 2.0
395 stars 67 forks source link

Support for Large Persisted Trees #50

Open anlumo opened 4 years ago

anlumo commented 4 years ago

Hello,

My use case involves a data structure that can be 1GB or more in size and has to be persisted in permanent storage (aka a database). Since it's millions of small objects, using bulk insert is probably not a good idea with O(n*log n) complexity.

I could just serialize the whole tree into a single blob and write that to disk. However, besides doubling the memory requirements, every single change would also make it necessary to serialize everything again.

A typical solution for B-trees is to store the non-leaf nodes as separate objects on disk and load them on demand. This should also be possible with R*trees, and ParentNode even implements Serialize and Deserialize. However, I don't see a way to construct an RTree object from this, and this would also need accessors for adding, removing and modifying nodes (and it even has to be async since it involves I/O).

Is there a chance that this feature might be implemented in this crate?

rmanoka commented 3 years ago

The RTree struct implements serde traits when "serde" feature is enabled. I suppose you're suggesting a more fine-grained control over which RTreeNode to serialize, and to be able to assemble / modify the tree dynamically? While not impossible, this sounds complicated as many guarantee / invariants would have to be verified or assumed on the pieces to be put together.

For starters, can you refer us to a B-Tree or another RTree library that allows such fine-grained modification of the subtrees? The requirements sounds more like a filesystem data-structure, which tends to be a very specialized implementation.

You may also want to explore PostGIS if you haven't; it is more of a database solution for spatial indexing, and seems more tuned to solve your use-case.

anlumo commented 3 years ago

Thank you for the response! I think you understood my idea correctly, and yes, sqlite is an example of this kind of implementation for a B-Tree.

Unfortunately, we cannot ship PostgreSQL with our desktop application, but thank you for the pointer.