ellenhp / airmail

Lightweight geocoder in pure Rust
https://airmail.rs/
Apache License 2.0
312 stars 3 forks source link

ReadME #18

Closed ahmedosman2001 closed 3 months ago

ahmedosman2001 commented 6 months ago

Thank you for this project. It's remarkable for its affordability.

Could you please provide documentation on how to install locally or on Fly.io, which you mentioned in your blog post

cjacky475 commented 6 months ago

Yes, I would be interested too. I was not sure how all this works. @ellenhp, you mentioned you run both this engine and then object storage on Fly.io. How exactly do they communicate? How to set it up? Such a great project to reduce costs on geocoding. Please, provide some sort of documentation/tutorial, sort of Getting Started. Thank you!

ellenhp commented 6 months ago

Thanks for the interest! Yes, I'm hoping to tackle documentation very soon. My plan is to

  1. Improve international address geocoding support first by implementing a new parser and tweaking the schema a bit to match. A new BERT model I'm working on right now should be much improved and might actually break SOTA with the help of the full libpostal training dataset.
  2. Integrate with the Pelias testing suite to quantify geocoding performance.
  3. Document everything and prepare for the first versioned release.

My hope was to be able to claim that Airmail, despite its tiny footprint, is only X% behind Pelias in geocoding performance. for the initial release, but if people want to test now I can prioritize documentation. There's not much to do while the model is training and I have a bunch of free time this weekend so I'll probably start on that. :)

ahmedosman2001 commented 6 months ago

These improvements are also important . Thank you. If you anticipate significant changes to the documentation following these improvements, I can wait documentation. However, if the changes will be minimal, it would be preferable to have the documentation first. Regardless, we'll be pleased to have the documentation whenever it's ready.

ahmedosman2001 commented 6 months ago

Hello @ellenhp, could you please direct me on where to find the 'spatial_custom' docker image? After hours, i managed to install and compile the code successful, but the program exits with an error indicating it cannot locate the docker image. If everything works, i will submit a pull for the documenation. By the way i searched 'spatial_custom' on public docker images but i can't find it yet.

Running `target/release/airmail_import_osm --wof-db whosonfirst-data-admin-gb-latest.spatial.db --index ./index --docker-socket /run/user/1000/podman/podman.sock --osmx england-latest.osmx`
chcon: can't apply partial context to unlabeled file 'whosonfirst-data-admin-gb-latest.spatial.db'
Creating container `airmail-pip-service-0`
thread 'main' panicked at /path/to/airmail/airmail_indexer/src/lib.rs:284:18:
Failed to start spatial server container.: DockerResponseServerError { status_code: 404, message: "no such image: docker.io/library/spatial_custom: image not known" }
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
ellenhp commented 6 months ago

That's here: https://github.com/ellenhp/spatial

I had to fork it from upstream due to a bug/missing feature to return all names in all languages for a whosonfirst administrative area. PR here, but I'm not certain if it'll be accepted so I'll probably set up some CI and push my forked container to ghcr. https://github.com/pelias/spatial/pull/84

Just clone my fork and docker build -t spatial_custom . (from memory, could be wrong on this syntax) and re-run the airmail import. :)

ellenhp commented 6 months ago

There may be some rough edges relating to the indexing process but from memory I think you're pretty far along if you're running into that failure. If you do manage to build an index and stuff please let me know what your impression is of the process, which parts were hardest, and what your impression of the API is. Everything about the project is subject to change because I haven't attached a version number to a release yet, but the upside of that is we can reshape it to be more ergonomic pretty easily, so let me know :)

ellenhp commented 6 months ago

Another thing to be aware of is that you should enable the remote_index feature if you plan on accessing the index over the network. This enables the quickwit feature on tantivy, which in turn enables the sstable feature which massively improves performance. You can do this by adding --features remote_index to your cargo run commands for building the index and running airmail_service. Another thing to be aware of is that INDEXING.md is outdated, and you'll need to provide an osmx extract instead of osmflat.

cjacky475 commented 6 months ago

@ellenhp, is this going to work only for geocoding and not reverse geocoding (find approximate place/city name from lat/lng coordinates)? Thanks.

ellenhp commented 6 months ago

I may add reverse geocoding later but for the time being it's forward only. I recommend the placeholder project by the Pelias team for coarse reverse geocoding (in other words, only administrative areas, not reverse geocoding to addresses).

edit: Sorry I mixed up placeholder and the spatial server. This is the one that I'd recommend for coarse reverse geocoding. https://github.com/pelias/spatial

Airmail uses it internally to get high-quality information about which administrative areas a POI is in, since that information is often missing from OSM.

ahmedosman2001 commented 6 months ago

@ellenhp Thank you for the Info. i managed to build your custom spatial image. Now when i run the program it seem to run without errors but theres this warning that makes indexing not to work [2024-04-02T15:36:21Z WARN airmail_import_osm::openstreetmap] Error from callback: sending on a disconnected channel

Here is the full log

/airmail$ RUST_LOG=info cargo run --release --bin airmail_import_osm --features remote_index -- --wof-db /airmail/whosonfirst-data-admin-gb-latest.spatial.db --index /airmail/index --docker-socket /var/run/docker.sock --osmx /airmail/united-kingdom-latest.osm.osmx 
    Finished release [optimized + debuginfo] target(s) in 1.78s
     Running `target/release/airmail_import_osm --wof-db /airmail/whosonfirst-data-admin-gb-latest.spatial.db --index /airmail/index --docker-socket /var/run/docker.sock --osmx /united-kingdom-latest.osm.osmx`
[2024-04-02T15:36:20Z INFO  tantivy::indexer::segment_updater] save metas
chcon: can't apply partial context to unlabeled file '/airmail/whosonfirst-data-admin-gb-latest.spatial.db'
Container `airmail-pip-service-0` is already running.
[airmail_import_osm/src/openstreetmap.rs:104:5] osmx_path = "/airmail/united-kingdom-latest.osm.osmx"
Processing nodes
[2024-04-02T15:36:21Z INFO  tantivy::indexer::index_writer] Preparing commit
[2024-04-02T15:36:21Z INFO  tantivy::indexer::index_writer] Prepared commit 0
[2024-04-02T15:36:21Z INFO  tantivy::indexer::prepared_commit] committing 0
[2024-04-02T15:36:21Z INFO  tantivy::indexer::segment_updater] save metas
[2024-04-02T15:36:21Z INFO  tantivy::indexer::segment_updater] Running garbage collection
[2024-04-02T15:36:21Z INFO  tantivy::directory::managed_directory] Garbage collect
Waiting for tasks to finish.
[2024-04-02T15:36:21Z WARN  airmail_import_osm::openstreetmap] Error from callback: sending on a disconnected channel
[2024-04-02T15:36:21Z WARN  airmail_import_osm::openstreetmap] Error from callback: sending on a disconnected channel

After the run, I started the API like this RUST_LOG=info cargo run --release --bin airmail_service -- --index ./index and used the end point http://localhost:3000/search?q=london. However, it retuns nothing for addresses in the uk.

ahmedosman2001 commented 6 months ago

There may be some rough edges relating to the indexing process but from memory I think you're pretty far along if you're running into that failure. If you do manage to build an index and stuff please let me know what your impression is of the process, which parts were hardest, and what your impression of the API is. Everything about the project is subject to change because I haven't attached a version number to a release yet, but the upside of that is we can reshape it to be more ergonomic pretty easily, so let me know :)

So far, my main challenges revolve around the operating system and third-party programs. For example, the build process (e.g cargo build) couldn't complete on Ubuntu 22.04; I had to switch to Debian, where the build was successful. Additionally, I encountered difficulties with using osmx as the binary didn't function for me, and i had to building it from source (e.g., using 'make'). Other issues stemmed from my limited experience with Rust and Docker. I'll share my API experience with you once I manage to run it.

This is the error i get when i try to build it on Ubuntu 22.04. (it works fine on Debian GNU/Linux 12 (bookworm))

   Compiling tantivy-common v0.6.0
error[E0432]: unresolved import `zstd_sys::ZSTD_cParameter::ZSTD_c_experimentalParam6`
   --> /.cargo/registry/src/index.crates.io-6f17d22bba15001f/zstd-safe-6.0.6/src/lib.rs:609:13
    |
609 |             ZSTD_c_experimentalParam6 as ZSTD_c_experimentalParam1,
    |             -------------------------^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    |             |
    |             no `ZSTD_c_experimentalParam6` in `ZSTD_cParameter`
    |             help: a similar name exists in the module: `ZSTD_c_experimentalParam1`

For more information about this error, try `rustc --explain E0432`.
error: could not compile `zstd-safe` (lib) due to 1 previous error
warning: build failed, waiting for other jobs to finish...
ellenhp commented 6 months ago

That's really frustrating that you had to switch operating systems to get it to build. ZSTD is one of only a couple non-rust dependencies of this project, which is usually where build issues show up. I put some time on my calendar after work to look into that.

cjacky475 commented 5 months ago

Hi, @ellenhp, any news on the documentation and how to get started? I'm really looking forward to this. Cheers.

ellenhp commented 5 months ago

Hi, it makes me really happy to hear people are enthusiastic about this, but I just want to be transparent that I'm pretty low on capacity lately. Another project has captured my interest but as that excitement wanes I'll try to come back to this! I'm pretty happy with where Airmail is at as a project. It's usable for a lot of use-cases and way easier on the wallet than anything else out there, even if you run it against a local index for <100ms response times. It definitely needs some docs though.

ellenhp commented 5 months ago

I've made a little progress as of last night. I wrote up all the prerequisites and now I need to do the indexing commands and the serving commands, then go through the docs myself and do a build, then push and PR. Also need to look into what's going on with the zstd dependency. I think it only gets pulled in because of tantivy, so you might be able to find a solution by looking through the tantivy github issues in the meantime.

mckelveygreg commented 5 months ago

I was able to get it to build in docker, but I needed to tweak the install cmd to be --locked otherwise, it was breaking by installing newer versions of things 🤷

RUN cargo install --path ./airmail_service --locked

After looking at all of the options out there, this project is really making me excited! We have need for an offline geocoder to run along side an full offline version of our entire platform. So having something that is compact like this perfectly fits our use case!

Happy to provide feedback and help debug 🎉

cjacky475 commented 4 months ago

@ellenhp, sorry for pinging again, I just really wanted to see the progress, maybe we could get at least what is done? I believe this basic 'how to get started' shouldn't be that extensive? Just a minimal how to launch is fine, with time and others help this can be improved. Thanks and sorry again for pinging, just really excited for this.

ellenhp commented 3 months ago

@cjacky475 @ahmedosman2001

Let me know if you have any issues with the docs. For Ahmed specifically, I think I fixed the zstd issue you had by bumping the version of tantivy we're using. If you have any more problems like that feel free to open an issue. If any of those issues come up in the future you should be able to use the Option 2 I listed in the build docs to hopefully bypass them.

ellenhp commented 3 months ago

@mckelveygreg Let me know if you have any feedback as well. You mentioned you got it building but this should give you everything you need to create an index and serve requests. Hopefully. :)

ahmedosman2001 commented 1 month ago

Thank you @ellenhp, i quickly tested and not seeing the errors i was facing before