Open boydjohnson opened 2 months ago
We definitely would consider a PR. Also we can easily release a new version on crates.io.
Did you manage to also reorder nodes/way based on the index to increase data locality?
I would assume that an index for nodes is "for free" / implicit as long as their are sorted by (some) space filling curve (z-order, hilbert, etc).
Ways and relations require extra data (but there's less of them to go around).
I wasn't able to reorder nodes/ways due to the lack of sort functionality on ExternalVector. I keep the ZOrderIndexEntries in an in-memory vector and sort by index at the end. I can do a US state on my laptop, but know users might want to run it on a planet file.
I asked ChatGPT to make an ascii diagram of how the ZOrderIndexEntry works.
+-----------------------------------------------------------+
| ZOrderIndexEntry |
|-----------------------------------------------------------|
| z_order (64 bits) | tag (8 bits) | idx (64 bits) |
+-----------------------------------------------------------+
|
v
+------------------------------------------------+
| GeoType (tag) |
|------------------------------------------------|
| Node = 0 | Way = 1 | Relation = 2 |
+------------------------------------------------+
| | |
v v v
+----------------+ +----------------+ +----------------+
| Nodes Slice | | Ways Slice | | Relations Slice|
+----------------+ +----------------+ +----------------+
| idx -> Node[ ] | | idx -> Way[ ] | | idx -> Relation[ ] |
+----------------+ +----------------+ +----------------+
Here's a code snippet of using osmflat
to query an archive.
let curve = XZ2SFC::wgs84(Z_ORDER_RESOLUTION);
let ranges = curve.ranges(x_min, y_min, x_max, y_max, None);
let spatial = self.osm.spatial_index();
let ids = self.osm.ids().unwrap();
ranges
.iter()
.map(|i| (i.lower(), i.upper()))
.flat_map(|v| {
let element = spatial.z_order_index().partition_point(|e| e.z_order() < v.0);
spatial.z_order_index()[element..]
.iter()
.take_while(move |e| e.z_order() <= v.1)
.filter_map(|entry| match entry.tag() {
osmflat::GeoType::Node => {
let item = &self.osm.nodes()[entry.value() as usize];
let id = &ids.nodes()[entry.value() as usize];
So I had a thought. Your idea of ordering the nodes, ways, relations each in index order is a good one. It would allow for less data to be stored. To get all the nodes, ways, relations at one spatial index would require 3 binary searches.
The data that wouldn't have to be stored would be (8 bits for the GeoType, 64 bits for the index of the node/way/relation) ~100GB on a planet file.
Then all that needs to be stored would be an optional spatial_index on each Node, Way, and Relation, 64 bits.
To allow for reordering nodes, ways, relations I would use RocksDB during the compilation process, since RocksDB allows iterating all key values by a sorted key. This would allow for the data to never be held in memory, but still be sorted.
That sounds like a good idea. You might not even need those extra 64 bit for nodes, as the spatial index of a node is (most likely) just the morton code of its coordinates, and could be computed on-the-fly?
Yeah, That makes sense. Since we have the latitude and longitude, we can calculate the index on the fly.
I thought to add a spatial index to osmflat-rs, and have it partially implemented here (https://github.com/boxdot/osmflat-rs/compare/master...boydjohnson:osmflat-rs:feature/geospatial-archive). It is implemented for Nodes and Ways, but not relations.
The index is this implementation (https://docs.rs/space-time/latest/space_time/xzorder/xz2_sfc/struct.XZ2SFC.html).
It allows for indexing Nodes, Ways, and Relations with the same curve, based on bounding box.
Aside from asking if you would consider a PR, I was wondering if there would be crates.io releases in the future for osmflat and osmflatc?
Best regards.