Performance idea: squeezing down file size

dabreegster commented 3 years ago

For the bike network tool, ultimately I'd like to work on and present some vision at a large scale -- like all of North or South Seattle (50-60MB right now, or 20 gzipped) or the whole area (270MB, or 100 compressed). The Map files are gigantic, but I have some ideas for paring this down and creating something smaller, containing less detail than what's needed for traffic simulation...

dabreegster commented 3 years ago

Current stats on huge_seattle:

[2021-09-02T01:41:13Z INFO  map_model::map] Total map size: 280,097,936 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 1 pathfinder: 111,774,757 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 217,197 buildings: 62,902,080 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 36,170 intersections: 48,813,308 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 50,299 roads: 28,934,342 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 245,033 lanes: 19,128,378 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 3,095 parking lots: 2,627,031 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 3,946 areas: 1,852,995 bytes
[2021-09-02T01:41:13Z INFO  map_model::map] - 436 zones: 17,520 bytes

and north_seattle:

[2021-09-02T01:42:00Z INFO  map_model::map] Total map size: 54,725,302 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 1 pathfinder: 20,412,835 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 52,069 buildings: 15,263,206 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 6,457 intersections: 8,918,260 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 9,047 roads: 5,251,766 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 45,872 lanes: 3,482,380 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 603 parking lots: 545,807 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 436 areas: 204,548 bytes
[2021-09-02T01:42:00Z INFO  map_model::map] - 68 zones: 2,900 bytes

Some fruit practically scrapes the ground. The contraction hierarchies are huge, but they're also probably not needed at all for this tool. We're not calculating lots of routes all the time. In fact, in the routing tool, I want to add a bunch of customization parameters for avoiding hills and busy roads -- and those're incompatible with CHs anyway, due to how slow rebuilding them would be. And large CHs slow down map edits. Dijkstra will serve us just fine. (We may want the CHs back for the travel demand model + mode shift analysis, but now sure how that'll work yet, and I don't think it's a main part of this tool.)

dabreegster commented 3 years ago

I spoke quickly. It'll take more work to squish down pathfinding. Just switching to CreateEngine::Dijkstra drops from 111MB to 44MB, which is substantial, but the result is still large. We don't need 6 graphs for the bike tool -- we just need the bike one.

Making edits will takes 10.5s in huge_seattle; even without updating the CH node ordering, calculating the graphs and edge weights is quite slow in such a huge map. https://github.com/a-b-street/abstreet/blob/1c756befa85be87991526851f7012638b87ea964/map_model/src/pathfind/vehicles.rs#L184 looks simple, but death by a thousand paper-clips is catching up... overall this method fetches and re-fetches lanes, intersections, turns, etc many times and winds up being nontrivial.

dabreegster commented 3 years ago

Better... with just the bike Dijkstra graph, huge_seattle is down from 270MB originally to 170. The pathfinder is now only about 5MB, not 111MB. make_input_graph could still use performance tuning -- that would help editing in the main game too, not just this tool.

dabreegster commented 3 years ago

With just the bike graph, recalculating pathfinding down from 10s to 3ish. But that's still absurd!

easbar commented 3 years ago

overall this method fetches and re-fetches lanes, intersections, turns, etc many times and winds up being nontrivial.

Not sure if you already decided against using CHs here, but maybe here it would help to be able to just update the weights of an input_graph instead of 'building' the entire graph again. Not sure though, because this would just mean we replace a list of (from, to, weight)s with a list of just weights.

dabreegster commented 3 years ago

Not sure if you already decided against using CHs here, but maybe here it would help to be able to just update the weights of an input_graph instead of 'building' the entire graph again

Luckily, #748 pretty much resolved this problem -- there was lots of redundant work happening when calculating the vehicle input graphs. Now this process is fast enough. I'm not sure updating the graph vs building from scratch would make much of a difference; the expensive thing is calculating each edge's weight.

Also, this issue is about a newer side-project to create a bike network planning tool (previewable at abstreet.s3-website.us-east-2.amazonaws.com/0.2.58/abstreet.html?--ungap&system/us/seattle/maps/udistrict.bin). There's no traffic simulation, so the usage patterns for pathfinding are very different. I'm still exploring the idea, but possibly the user will have a few sliders where they can tune parameters that make bike routes avoid hills or avoid stressful roads. Only one or two routes would be calculated using these new weights. So probably, just Dijkstra's makes much more sense here. But that's not known for sure -- I might discretize the routing preferences into just a few options, after doing some of the parameter tuning myself and finding values that produce subjectively good routes in some example areas I know well.

dabreegster commented 3 years ago

Pathfinding, including when editing the map, is now quite fast, even on the huge map. (One of the last perf bugs was actually unrelated to pathfinding, 9803558885d86984d7d06eb544e8334b71494bba).

Back to the original goal -- how much can we squeeze down huge_seattle? Some measurements running natively (it'll be slower on the web).

As a baseline, 256MB filesize and about 9.5s on my fast machine to initially load.

Buildings

After stripping out some of the CHs that aren't used for the bike tool and removing all buildings, 100MB and only 4.6s! Only 34MB when gzipped! There really are lots of buildings in the huge map. What do we really need them for in the bike net tool? Display (to give context to the map), routing (right now start and end points are buildings, but we could maybe relax that to just snap to intersections), and mode shift (the input path requests are in terms of buildings, which then map over to road positions). One idea I've been toying with is to "summarize" buildings in one block. Originally we have Screenshot from 2021-10-07 09-35-32 But we could "compress" this to one building and just record the total number of commercial and residential units here. This would be particularly relevant for super-dense places like Sao Paulo: Routing might be odd with one mega-building per block, because the building will snap to one of the 4 surrounding roads kind of by chance, and this'll force all the routes to look odd. So maybe routing by intersection endpoints would need to happen anyway.

Not sure this is doable in the short-term, but it's an idea to hold onto.

Lanes

Of the 4.5s to load the map in the previous section, 1.5s is building the quadtree! The number of lanes dominates easily -- about 5x the number of roads. We reorganized lanes to exist and be identified just as children of roads kind of recently, and it's much nicer code. But that change hasn't been done in the DrawMap layer yet -- we have separate DrawRoad and DrawLane objects. It would make way more sense to nest lanes under roads here too. When the quadtree sees a road in view, we can lazily expand to the few lanes it contains.

A quick test of what omitting the lanes from the quadtree would be like... yeah, 3.1s! The slow steps are then just deserializing the 100MB file and building the massive GeomBatch of unzoomed roads and intersections.

This is totally worth doing in the short term.

dabreegster commented 3 years ago

Turns

Let's get even crazier. Of the 100MB from the last section, intersections take half that space. Turns are nested underneath intersections now. Do we really need to save them? We recalculate them when we edit roads anyway, so... what if we didn't stash them in the file and just lazily calculated them?

60MB file, 2.5s to load. 20MB gzipped. Wow!

Optimizing for initial loading experience is important because of first impressions, but of course that has to be balanced with actually using the tool. How quickly would we need to fill out turns, and would it all happen at once (thus incurring a slow step shortly after initial load)? Really need to just try it out to know for sure, but I think the first time the user tries to route or quick-sketch, we'll need everything.

Lanes

There's similar redundancy with roads and lanes. We have the road center-line; we could lazily fill in lane center-lines from that, instead of serializing.

a-b-street / abstreet