a-b-street / abstreet

Transportation planning and traffic simulation software for creating cities friendlier to walking, biking, and public transit
https://a-b-street.github.io/docs/
Apache License 2.0
7.76k stars 345 forks source link

Cleanup after the self-serve city import GUI #523

Open matkoniecz opened 3 years ago

matkoniecz commented 3 years ago

https://a-b-street.github.io/docs/howto/new_city.html is a bit tricky and I have seen some people from OSM interested by it, but scared away by command line stuff.

I think that JOSM-like download interface (that would also reject too large areas) would be great.

dabreegster commented 3 years ago

I've heard some requests for something like this. I think it's pretty feasible to add an in-game UI for it.

1) First open the browser to geojson.io and tell them to draw there 2) Somehow copy back the results (the tricky step) 3) Use https://github.com/a-b-street/abstreet/blob/master/importer/src/bin/pick_geofabrik.rs to find the osm.pbf best matching the boundary 4) Run osmconvert to clip 5) Run the one-shot importer automatically

Step 2 is a bit awkward because we'd need to support copy/paste (across 4 platforms) -- there are crates for it and I think we used to use one of them, so maybe not bad. The single-line text entry box in abst today sucks and I really don't want to make a multi-line one, but we could maybe just have a button saying "grab from system clipboard" and display the result and not allow editing in the UI.

Step 3 is more what we would do for a real permanent import; it might be overkill or too large a download for a quick demo. I'm not aware of any OSM download service that'll extract the city-sized areas people will likely draw. https://protomaps.com/bundles is the closest, but we need a .osm ultimately; I don't think the bundle has enough.

Step 4 means bundling the osmconvert binary for the correct platform. Or... if that's a hassle, I don't actually think it's so hard to implement the equivalent of osmconvert large_map.osm -B=clipping.poly --complete-ways -o=smaller_map.osm in Rust with the OSM parser we've already got. Argh, I guess the complication is that we don't handle .pbf. There are crates like https://crates.io/crates/osmpbf we could use.

I think this is sounding like an awesome hackathon project. Tentative plan is to do this in two weeks at https://democracylab.org/index/?section=AboutEvent&id=sthacktricksday2021

matkoniecz commented 3 years ago

Step 2 is a bit awkward because we'd need to support copy/paste (across 4 platforms) -- there are crates for it and I think we used to use one of them, so maybe not bad. The single-line text entry box in abst today sucks and I really don't want to make a multi-line one, but we could maybe just have a button saying "grab from system clipboard" and display the result and not allow editing in the UI.

What about showing map within abstreet with bbox selection done there? Exactly like it is done by JOSM?

matkoniecz commented 3 years ago

I'm not aware of any OSM download service that'll extract the city-sized areas people will likely draw

Limit it to a part of the city (say, like Kraków city center bundled in AB Street by default) and use Overpass API? May be not feasible if used on a large scale, but I am not expecting it done on massive scale.

michaelkirk commented 3 years ago

To make sure I'm understanding, it sounds like what you are describing is predicated on having a zoomable slippy map of the world in abstreet?

dabreegster commented 3 years ago

What about showing map within abstreet with bbox selection done there? Exactly like it is done by JOSM? it sounds like what you are describing is predicated on having a zoomable slippy map of the world in abstreet?

This sounds hard to implement, so that's why I think we can do step 1 and 2 and just open up the browser to an existing interface that can select boundaries, and tell people to copy-paste the GeoJSON blob back. It'd be cool to embed it directly and make this even easier, but I think even with a browser step, this'd be a huge step in usability.

use Overpass API?

Maybe worth a try, but this is probably not what it's intended for. We need to select all nodes, ways, relations intersecting that bbox. We could try to get clever and filter just for certain types, but that list is kind of huge and probably hard to maintain.

matkoniecz commented 3 years ago

To make sure I'm understanding, it sounds like what you are describing is predicated on having a zoomable slippy map of the world in abstreet?

Yes. No idea how hard it would be and is it reasonable. Main factor is, I guess, "can we use OSMF tile servers or other free tier of tile hosting".

Maybe worth a try, but this is probably not what it's intended for. We need to select all nodes, ways, relations intersecting that bbox.

I do it for some projects, downloading entire cities is within Overpass API range. My guess is that size limitation would be AB performance, not Overpass download capability. And for now any area runnable in AB would be downloadable with Overpass.

(no promises here, and natural=coastline is a trap here - Overpass data needs augmenting with https://osmdata.openstreetmap.de/data/water-polygons.html or some tricky processing to get seas/oceans rendered)

But if you are worried that it may be misuse of Overpass - look at their usage policy (I remember reading it - but maybe it changed since that time)

We could try to get clever and filter just for certain types

For some cases I exclude buildings, but AB is using them anyway. I would not try filtering.

AB is using most of data, throwing out unused scraps would be asking for trouble with minor benefits.

dabreegster commented 3 years ago

Sounds like the Overpass idea is worth trying, then. Closer to the hackathon, I'll break this down into parallelizable tasks, one of which could be a little tool that tries to effectively assemble a .osm from Overpass.

matkoniecz commented 3 years ago

little tool that tries to effectively assemble a .osm from Overpass.

If it can be downloaded from Overpass (I am betting that any area that can be loaded and run sanely will be) you can simply download .osm file ([out:xml]).

dabreegster commented 3 years ago

Dumping more related tasks for the hackathon:

matkoniecz commented 3 years ago

I just tested whatever what I was saying matches reality. I just downloaded 600MB+ .osm file representing city of Vienna with an Overpass query.

I used following script that used (very unimpressive) library code from https://github.com/matkoniecz/osm_bot_abstraction_layer/blob/master/osm_bot_abstraction_layer/overpass_downloader.py (basically, logging and retrying + inefficient file saving).

from osm_bot_abstraction_layer.overpass_downloader import download_overpass_query

vienna_download = """
  [timeout:250];
  (
    nwr(48.10,16.26,48.26,16.51);
  );
out body;
>;
out skel qt;
"""
download_overpass_query(vienna_download, '/tmp/vienna.osm')

Disclaimer: I am not maintainer of Overpass service, not sure how significant use AB Street would generate, not sure how significant use is OK.

View of data in JOSM (note inclusion of also object partially included in queried area, though ones without even node there can be missed):

(note: JOSM needed also metadata like version of object, so this was fetched with

[timeout:250];
(
    nwr(48.10,16.26,48.26,16.51);
);
out meta
>;
out meta qt;

)

screen06

dabreegster commented 3 years ago

Thank you for confirming Overpass is a viable option! It looks extremely straightforward for us to use it too.

I am not maintainer of Overpass service, not sure how significant use AB Street would generate, not sure how significant use is OK.

I'll double-check their policy before publicizing this tool, and check with the owner. I suspect they have rate limiting on their end for individual users, but we could also send an "A/B Street" user agent if they want to throttle across all users.

matkoniecz commented 3 years ago

I suspect they have rate limiting on their end for individual users

Yes. See http://overpass-api.de/api/status Though I am not sure if they have user-agent limiting or just request to self-limit. But I expect that AB Street:

Alfred-Mountfield commented 3 years ago

Another possibility for OSM data is GeoFabrik. A relatively new but really quite powerful Python library uses it for automatic data acquisition https://pypi.org/project/pyrosm/

It downloads .osm files in Protocol Buffer Binary Format and I've been using it in an automated pipeline to grab data for the whole of London (which surely is one the upper-end of areas you'd be able to handle).

The library is open-source so it might provide a nice example of how to do it: https://github.com/HTenkanen/pyrosm/blob/master/pyrosm/data/geofabrik.py

Not sure if you could directly call the library to handle it or if you'd have to reimplement the logic in Rust but it seems like it'd be a good alternative and it might have less throttling as they're direct file downloads rather than custom queries

matkoniecz commented 3 years ago

One more thing to consider for usecase of use for OSM mapping validation: how long mappers needs to wait after update to rerun import and see changes.

For Overpass it is enough to wait 5-10 minutes (right now replication lag is 2 minutes).

dabreegster commented 3 years ago

Alright, the hackathon is this Sauturday. Anybody reading this issue, feel free to join for the day! Rust experience useful, but we can also spend the time mentoring -- it's more important at the hackathons to grow the community than to finish the tasks.

Design decisions

1) Only support this running natively, not on the web. We can revisit this later, but it adds lots of complications. 2) Execute the importer and all other tools as subprocesses from the game crate. Don't link in the code as a library. It would really bloat the build, which we've been trying to pare down. 3) In parallel, try using Overpass and Geofabrik to get our .osm input, as discussed in the rest of this issue. We can pivot to just focus on one based on how things are going.

Tasks

These can all be done simultaneously and in any order, except for the last one.

Then glue everything together in the UI, and modify the release scripts to include whatever new binaries are needed.

Bonus tasks

If we have time/people:

dabreegster commented 3 years ago

Just got cargo run --bin one_step_import -- bedfordshire.geojson to work! Lots of smoothing out needed, but that wasn't hard so far. Quick lunch break, then I'll wire up a UI.

dabreegster commented 3 years ago

End-to-end import from the UI now "works". Lots of hacks along the way. First up, going to work on a sane way to run a command and display live STDOUT/STDERR in widgetry.

dabreegster commented 3 years ago

screencast Has some rough edges, but after about 6 hours of work, it works. :D

dabreegster commented 3 years ago

Likely done for the weekend. This'll be included in this week's release, but it's not ready for general use / testing yet. I'll finish this off next week. The remaining tasks, in no real order:

Very notably, none of this works on the web. I think it could be done in a few different ways, but still leaving that as future work.

dabreegster commented 3 years ago

This is now done/released. Repurposing this bug to track cleanup tasks that came up along the way.

michaelkirk commented 3 years ago

Finally giving the new import screen a wirl.

I'm proposing some minor UI changes here: https://github.com/a-b-street/abstreet/pull/584/files

When running from my development build, I initially saw this screen:

Screen Shot 2021-03-23 at 10 55 44 AM

From the application log I was able to deduce I need to do a cargo build --release to compile the import binary, then the import process started to work. Probably fine for dev builds, but wanted to check - is this binary included in the release so that everything "just works"?

The import process is still running - I've swapped out because clip_osm is 16.5GB and climbing. It'd be neat to make that less of a hog.

dabreegster commented 3 years ago

From the application log I was able to deduce I need to do

Good point, this error could be more clear what you need to do. I guess I was assuming most people aren't building from source...

is this binary included in the release so that everything "just works"?

Yes, it is! I tested with the Linux .zip and it seems fine.

I've swapped out because clip_osm is 16.5GB and climbing

Turns out reimplementing this little piece of osmconvert wasn't as straightforward as I expected. OSM files (including pbf, afaict) always list all nodes, then ways, then relations. The clip tool currently works like this:

1) Read and cache ALL nodes, no matter how far outside of the boundary they are 2) For each way, check if ANY of its nodes are inside the boundary. Keep it if so 3) For each relation, check if ANY of its children (nodes, ways, or other relations) have been saved. Save it if so. 4) Crawl all ways (whoops just realized this should handle relations too) and build a list of nodes we actually use 5) Write all of the saved nodes, ways, and relations

That first step is the memory hog. Filtering the nodes inside the boundary in step 1 doesn't work, because then a way crossing the boundary refers to nodes we haven't saved, so the final output doesn't have the node just outside the boundary, which messes up clipping and removes all border intersections.

From skimming the osmconvert source, they do lots of file rewinding. I'm still not sure all of the performance tricks they use, but I think we could adapt our approach to avoid the memory issue.

1) Read all nodes, only caching the ones in bounds 2) Read all ways, only caching the ones with at least one saved node 3) Same for relations 4) Build a set of all nodes that all of our saved ways and relations refer to 5) Go back and re-read the file from the start. Save all nodes in the above set 6) Write all 3

matkoniecz commented 3 years ago

Same for relations

Note that you should have to all ways referred to by relations, what may include ways not cached in step 2.

So you will need to cache additional ways, what may trigger need to cache additional nodes (see for example multipolygon on edge of area).

You may also skip some relations, for example load only multipolygons and public transport routes (anything else?) to avoid loading entire boundary of country or entire waterway relations.

michaelkirk commented 3 years ago

So you will need to cache additional ways, what may trigger need to cache additional nodes (see for example multipolygon on edge of area).

That's a good point. I think that's a bug with the current implementation in abstreet too. Not sure how often it bites us.

matkoniecz commented 3 years ago

Likely not a big issue, as it will affect only complex buildings and eyecandy that is both on map edge and placed in an unfortunate way.

Roads, footway should not be affected as pedestrian areas are not supported anyway from what I remember.

BTW, nested relations are possible - but unlikely to be used for anything AB related. Maybe some bus routes?

matkoniecz commented 3 years ago

Is it possible to add it also to http://abstreet.s3-website.us-east-2.amazonaws.com/0.2.50/osm_viewer.html ?

dabreegster commented 3 years ago

Is it possible to add it also to http://abstreet.s3-website.us-east-2.amazonaws.com/0.2.50/osm_viewer.html ?

It's already there -- all of the tools share the same map loading UI. Click the map name in the top-left, then scroll down. This map loading UI is becoming overwhelming now that we have lots of countries, and Yuwen had a design somewhere for reorganizing it and making the search function more prominent -- I'll see if I can dig it up and prototype it.

Screenshot from 2021-07-12 12-38-37 Screenshot from 2021-07-12 12-38-47

matkoniecz commented 3 years ago

On reread it seems that I missed

This only works in the downloaded version, not on web.

part

dabreegster commented 3 years ago

Ah right. It would technically be possible to make all of this work on the web, but to avoid exploding the initial .wasm size to download with the extra importer dependencies, we'd have to carve out a separate WASM app to separately fetch and execute. And maybe trying to import all in a browser is too intense, and at this point, it'd be simpler to just run a small API server somewhere with reasonable rate-limiting.

Not planning on tackling that soon.

But since you've reminded me of this issue, I'm actually going to take a shot at using Overpass instead of geofabrik + clipping, since I never actually tried that.

dabreegster commented 3 years ago

I switched the default to Overpass -- it was super easy to wire up, and is way faster than downloading and clipping a potentially massive PBF. This also maybe opens the door for some kind of "refresh my current map with recent data" feature.

https://twitter.com/CarlinoDustin/status/1414699376887287833

matkoniecz commented 3 years ago

Ooo, thanks!

One thing that seems missing is a clear handling of Overpass refusal to serve data (say someone trying load entire USA in one go or repeated downloads resulting in exhausting quota)

429, 503, 504 error codes indicate rate limiting - so probably a clear message should be shown on them explaining that Overpass has refused to serve a query rather than generic one

matkoniecz commented 3 years ago

BTW, has Overpass got exposed to web? From looking at diff it seems that it is not changed but it is likely that I missed relevant diff.

dabreegster commented 3 years ago

One thing that seems missing is a clear handling of Overpass refusal to serve data

Indeed. I can try testing this later to see how the error shows up. I think https://github.com/a-b-street/abstreet/blob/f35538b460146cc637123c987cb96e181b8982da/abstio/src/download.rs#L10 should plumb back the HTTP response code, and https://github.com/a-b-street/abstreet/blob/f35538b460146cc637123c987cb96e181b8982da/importer/src/bin/one_step_import.rs#L73 could check the error code and surface a better message. Might have to rearrange the error handling to plumb back the response code properly.

BTW, has Overpass got exposed to web?

No, nothing has changed here. https://github.com/a-b-street/abstreet/blob/master/importer/src/bin/one_step_import.rs isn't web-friendly. Everything could be done in memory in the browser -- downloading the osm.xml, running the importer, producing the binary map file, feeding it back to abstreet. But bundling all of the code to do that in the abstreet.wasm file would really bloat it; I'd want to keep it separate. And if it was separate, we'd have to asynchronously load and execute the extra WASM, then find some way (maybe the browser's user session storage) to plumb back the binary map file. All of that feels like it's getting complicated. The alternative is running a little server somewhere that does all of this and returns the URL of a hosted binary map.