dabreegster commented 2 years ago

By early November, it would be "really nice" if the LTN tool could be run easily anywhere in the UK. The import process today is not that bad on native, but it could be smoother. It doesn't work on web, and that's more the blocker -- I want to tell someone to open their browser, click a URL, and have something working in... under 30 seconds ideally?

Related to #326.

Idea 1: make the Overpass import work on web

aka, continue #523

Today on native, we ask people to draw a boundary with geojson.io, then run cli one-step-import, which pulls from Overpass and does everything. In principle, this could all work on web too. The biggest friction is bringing in more dependencies (convert_osm, import_streets, importer, etc) into the main binary, after a previous effort to tease them out.

We could maybe do something funny, like open a new browser tab pointed at a new page, load a different WASM binary, do the import, and stash the resulting serialized map in local storage or something. Except we've already been hitting size limits with that -- 5MB or so? That definitely won't work.

Idea 2: flatgeobuffer

I could import all of the UK (from one giant .pbf) and turn it into one massive RawMap. Then we could convert it to a flatgeobuffer (fgb) file, stick that in S3. At runtime, someone requests a bbox of the area they want, and we efficiently read a minimal part of the fgb file, convert it back to RawMap, then finish the conversion into Map.

I had a quick look through RawMap and StreetNetwork. All of the important stuff is indexable in a FGB -- roads, intersections, areas, buildings.

Open questions:

How horrifically slow would it be for me to import the entire UK on my own machine? The process is parallelized on totally different inputs right now, so processing one big file is probably a scary idea. (Also, we only read osm.xml today -- for perf, we almost definitely would want to switch to osm.pbf)
How fast would the dynamic import process be for someone in a web browser? Could maybe try to estimate this by trying out the US census population reader in practice today, and adding up time to do the second half of map conversions.
How would the coordinate projection work? Right now, every map turns into Mercator (?) using the boundary. We natively store everything as distances in meters from a reference point. It works fine for city-sized areas, but for all of the UK, the distortion is probably crazy. And it would be an absolutely massive and complex change to change all of the code everywhere to work off of LonLat and do different math.

CC @BudgieInWA for thoughts about osm2streets and @michaelkirk for prior experience with the US census fgb.

Idea 3: Tiling

Chop up the UK into fixed square tiles of some size, and just import all of them as regular maps. If someone gets lucky and the area they care about is firmly within a tile, then they're all set. We could also consider gluing together adjacent tiles... maybe that would mean storing just as RawMaps (which're smaller and simpler -- no pathfinding, sequential IDs, or size-heavy lanes or turns) and then having a way to merge two adjacent ones together. Aside from the roads/buildings/areas crossing the boundary, this seems logically straightforward. As long as the total area isn't too big, the two Mercator projections glued together should be OK.

Common problems

All of these implementations share some issues.

To start with, how do users pick the area they want? We could continue to ask people to draw the area they want with geojson.io, and maybe even just limit to rectangles (or transform their polygon into a bbox). It'd be kind of neat to have something else built-in, but very high effort to read and display existing vector/raster tiles.

Today, lots of code, UIs, and save files assume a map name -- country, city, specific map name. For these custom areas or tiles, maybe we just have to use the coordinates of the 4 corners and update all the code to understand this special case. Over time, save files need to be stored and indexed differently to possibly apply to many different people's drawn boundaries. (This kind of works today within a country+city.)

How would switching apps work? Natively, we could cache the imported area (as we do now in zz/oneshot) and just open it again. On the web, I guess we hit the local storage problem -- we'd need a way to pass 50-100MB files around locally. Or just pay the cost and import again.

dabreegster commented 2 years ago

Idea 4: Server-side

When someone wants to import a new area, they hit an API and wait for a response (by repeating the request via polling). The backend does something clever to dedupe requests and then just runs the existing importer (reading from Overpass, or maybe a pre-cached copy of all of Geofabrik). It sticks the result in S3, but has some kind of LRU caching to manage size.

This requires deploying and operating a backend, whereas all the other ideas put the burden on the client. But in practice, it's probably not that bad, without strong uptime or latency guarantees.

How is this different than me just pre-importing a load of places? It's lazy, based on actual demand. Every time I update the map format and have to re-import hundreds of maps, it's slow. Ideally I'd whittle down to a much smaller set of "special maps" that I use for preventing regressions in importing code, or because somebody is actively using them that I know about.

dabreegster commented 2 years ago

How would the coordinate projection work?

@bdon pointed me towards UTM or looking into coordinate systems designed for the UK. Maybe we could convert everything into those coordinates and continue treating everything as Cartesian

dabreegster commented 2 years ago

@Robinlovelace pointed me to https://epsg.io/27700 for the UK. OK, so the coordinate problem in idea 2 is solvable! Maybe the next test needs to be how quick it is to read thousands of features from FGB, or if any kind of batching is needed

michaelkirk commented 2 years ago

Using FGB sounds interesting for sure, but I couldn't confidently say it'd work out, because I don't think they've had a ton of use - especially the rust and typescript drivers.

We're using an FGB to serve census polygons for abstreet, but I've never personally used it for mixed geometry collections — e.g. a bunch of road line strings and a bunch of polygons living in the same collection. Allegedly it works, but I can't vouch for it.

Another thing to consider are data access patterns. With FGB you currently only have a seek-forward stream, so you couldn't revisit past entities without buffering them into memory or streaming through the entire thing again. Would that work for your use case?

I think the cost of traversing the index to get your bbox would likely be acceptable because, realistically, the processing to transform the map data into whatever you need for the LTN app will likely dwarf it.

Which brings me to revisit your option 4 (something server side). I'm not really sure what happens between "get the map data" and "use the app", is there substantial server side processing that you could do that would make the time-to-start-using-the-app happen faster for users, or is this really only about getting the client the right slice of the data?

dabreegster commented 2 years ago

With FGB you currently only have a seek-forward stream, so you couldn't revisit past entities without buffering them into memory or streaming through the entire thing again

That should be fine... the features will be fed into the FGB writer in an arbitrary order, but I'm assuming part of the writing + index creation arranges things somehow. So that when we do select_bbox and then iterate through the results, we only ever seek through once. We might get features in some arbitrary order (like road line-strings and building polygons interspersed even), but that doesn't matter.

is there substantial server side processing that you could do that would make the time-to-start-using-the-app happen faster for users

I just did a few quick tests in release mode on native. nl/groningen/huge takes 31s to go from RawMap to Map (which is the step that would happen after reading everything from the FGB). Of that, matching buildings to the nearest road is the slowest step, at 18s -- and that's with full parallelization, which will be unavailable on web until we figure out web workers. Preparing the pathfinding CHs is the next slow bit at about 3s, and we could actually skip that for importing to the LTN tool, because it doesn't use the CHs anyway. Another smaller area is gb/leeds/north -- 15s for RawMap to Map, again with building matching at 8s being the bottleneck.

So you raise a great point -- as it works now, shoving the RawMap into a FGB and making the client do the final conversion would still be expensive. Matching buildings to the nearest road is only needed for one mode in the LTN tool (planning a route), and that UI is a little awkward right now anyway -- you have to click a building as a trip endpoint, which isn't intuitive or easy at low zooms. We could instead just snap a waypoint to the nearest position on each side of a road and get basically the same effect (or even better UX), much more cheaply. I'd have to think through how to structure this "hack" -- we'd basically be making a building's sidewalk_pos and driveway_geom optional and dealing with that consequence everywhere else in the codebase.

To note, I don't want to try to put the final Map file into an FGB:

1) the file size would be much greater -- we balloon roads into individual lanes, each with their own geometry 2) Map uses sequential IDs for everything, so we can't just clip an area and punch holes in the ID space 3) the pathfinding contraction hierarchy can't be subsetted, we'd have to just rebuild

So I think RawMap as an FGB is still a viable option, but it would take some work, and the performance for clients is still a bit unknown -- but seeing the FGB reading part should be not too hard of a test to try out. The bigger issue is still how to make one gigantic RawMap on my laptop (or a big VM somewhere, but that would make my development workflow a headache). Maybe all the internal importing steps can be parallelized enough and the initial OSM extraction fits in memory. But if not, idea 2 kind of breaks down anyway. Maybe I'll just give that a shot first then and see.

dabreegster commented 2 years ago

Haha, importing a 21GB uncompressed england-latest.osm.xml fails outright because an earlier refactor to be web-friendly reads the entire file into memory first, so memory allocation of 22439708241 bytes failed. Would also need to first fix that!

Robinlovelace commented 2 years ago

Sounds like a mega import!

BudgieInWA commented 2 years ago

We could maybe do something funny ... load a different WASM binary, do the import ...

We would want to load the JS/WASM dependency in the window and call it directly. Append a script tag to the dom or use dynamic import or something.

This is a lucrative goal, because this is the common JS usecase for osm2streets I think.

<script ... src="unpkg.com/osm2streets"></script>
<script ...>
const streets = await window.osm2streets.fetchArea("Fremantle"); // uses Nominatim like overpass-turbo {{geocodeArea:...}}
...
</script>

BudgieInWA commented 2 years ago

Tiling

Chop up the UK into fixed square tiles of some size, and just import all of them as regular maps. We could also consider gluing together adjacent tiles...

This is a very lucrative goal I think, because once the tiling is incorporated, you can work at whatever scale you need to, if you spend the time computing tiles.

When a single short way is modified in JOSM one or two smallish tiles could be recalculated most of the time.
We can hook up raster/vector tiles for backward compat integration into slippy maps everywhere!

Calculate tiles with enough buffer around the edges that the influence of relevant osm geometry is considered.

dabreegster commented 2 years ago

We would want to load the JS/WASM dependency in the window and call it directly. Append a script tag to the dom or use dynamic import or something.

:O I didn't realize browsers could do that, but it makes perfect sense! Then there's no hassle to "pass back" the data from this computation.

OK all of the ideas are promising. In the very short term, I might go for the 1st, since it has the easiest implementation and would be more widely useful. Then I might try a bit of each of the other ideas and flesh out more tradeoffs

kchomacau commented 2 years ago

Haha, importing a 21GB uncompressed england-latest.osm.xml fails outright because an earlier refactor to be web-friendly reads the entire file into memory first, so memory allocation of 22439708241 bytes failed. Would also need to first fix that!

When I import montlake.bin with cargo run --bin game --release, I also encountered memory allocation of 7017276123103129189 bytes failed (6EB...)

dabreegster commented 2 years ago

When I import montlake.bin with cargo run --bin game --release, I also encountered memory allocation of 7017276123103129189 bytes failed

(That's unrelated to this issue.) 62d289b0a92146e6fb50840843ac3db5c46bd943 last night changed the binary format, so if you did git pull, you also need to do cargo run --release --bin updater -- download. See https://a-b-street.github.io/docs//tech/dev/index.html#downloading-more-cities. I'm guessing the raw_map file became out of date

dabreegster commented 2 years ago

https://github.com/a-b-street/abstreet/tree/wasm_importer is in progress for "Idea 1: make the Overpass import work on web". There are some weird JS tricks in here (like we can't dynamically import from wasm-bindgen, and we can't even use inline_js as a workaround), but so far, no blockers. We can just define a new JS function that does the dynamic import, and later call it. Lots of refactoring needed to make convert_osm work without a filesystem, though

dabreegster commented 1 year ago

Reviving this for AI:UK -- I need to be able to spin up the LTN tool anywhere in the UK relatively quickly for live demo purposes.

I've been trying imports of massive transport authority regions as one experiment. Importing huge areas itself is slow for some reasons I'm working on, but could be done beforehand. Problem is, loading them, rendering road labels, everything in the LTN app also gets slower. So... not looking like a good option.

To avoid hitting the network, we could use an england-wide cached pbf and (now) use osmium to clip. I could pre-download all geofabrik slices in UK to speed things up. But TODO, also try running a local Overpass server with just the UK as input.

Another approach I'm considering is to import a bunch of smaller places. I've tried this before as tiles, but the resulting boundaries are too arbitrarily weird. Maybe all of the smaller boundaries from ATIP would be a good candidate.

Robinlovelace commented 1 year ago

Maybe all of the smaller boundaries from ATIP would be a good candidate.

I think local authority district boundaries, which include boroughs of London, would be good for this. Good luck with this mission!

dabreegster commented 1 year ago

pueue parallel 1 first, or something low, due to osmium.

for x in ../abstreet-to-atip/aiuk_boundaries/*; do NAME=`basename -s .geojson $x`; pueue add --escape ./target/release/cli one-step-import --geojson-path "$x" --map-name "$NAME" --use-geofabrik --use-osmium --skip-ch; done

dabreegster commented 1 year ago

(There are somehow newlines in the filenames; I think basename is weird)

dabreegster commented 1 year ago

About 380 files, 12GB output, each map loads fast enough. This'll work fine for AIUK. I'll also prepare a clickable geojson map to avoid asking people if they know the LAD of their area!

Robinlovelace commented 1 year ago

Let me know if you need more compute power at some point!

a-b-street / abstreet

Import all of the UK #1009

Idea 1: make the Overpass import work on web

Idea 2: flatgeobuffer

Idea 3: Tiling

Common problems

Idea 4: Server-side

Tiling