ellenhp / airmail

Lightweight geocoder in pure Rust
https://airmail.rs/
Apache License 2.0
292 stars 3 forks source link

New parser #8

Closed ellenhp closed 5 months ago

ellenhp commented 5 months ago

Airmail needs a new parser. It's been really tricky to get the current one working acceptably for English and I only barely speak one other language, out of dozens that I'd ultimately like to support. I can't do tuning for every language. There aren't really any good options here. A more complete port of the Pelias parser could work but I want to investigate using a bidirectional lstm too. I've looked into porting libpostal but that would be a monumental amount of effort and I think the model size is prohibitive for an application as lightweight as airmail. I don't need 99.45% accuracy given how incomplete the indexed data is so I'm wondering if I can make something myself using the libpostal training data. A port of the libpostal normalization and numex code would be required if I were to go that route, I expect.

ellenhp commented 5 months ago

I decided to try out the Photon approach (Photon does no explicit query parsing, afaik) locally and it works extremely well with a bit of tuning. Closing this, might reopen if I change my mind on whether a parser is necessary. For now it seems like no.