ellenhp / airmail

Lightweight geocoder in pure Rust
https://airmail.rs/
Apache License 2.0
292 stars 3 forks source link

Overhaul indexing, ideally switching to OSMExpress or something as an intermediate format #7

Closed ellenhp closed 5 months ago

ellenhp commented 5 months ago

The intermediate format I use right now thrashes SSDs and isn't even fast, so I'll probably either build something on top of redb or build bindings for OSMExpress.

bdon commented 5 months ago

It wouldn't be too bad to reimplement an OSMExpress-like storage engine on top of redb if there's an S2 library out there, and you could also design from the start for compression and parallel query executions, which OSMExpress lacks as-is.

ellenhp commented 5 months ago

For now I'm just looking to replace the intermediate format I use between .osm.pbf and the final tantivy index, so I think I'm fine with using local storage for that part. It's a one-time thing and doesn't scale horizontally. I just need something that supports fast random access to resolve way/relation dependencies during indexing.

ellenhp commented 5 months ago

osmflat works really well for this locally, but does require about 160gb of memory for the planet for expansion. I'll probably try and get a cron job set up to upload those artifacts to R2 to make index generation a little more accessible. Leaving this open until I get it merged.

ellenhp commented 5 months ago

Merged in #10

bdon commented 4 months ago

@jake-low just wrote this which might be interesting for airmail? https://lib.rs/crates/osmx

ellenhp commented 4 months ago

@bdon Thanks for the tip! Jake is an old coworker so we met up at a coffee shop over the weekend and caught up. I ended up switching to osmx-rs for indexing, just pushed that change. :)