a-b-street / abstreet

Transportation planning and traffic simulation software for creating cities friendlier to walking, biking, and public transit
https://a-b-street.github.io/docs/
Apache License 2.0
7.68k stars 341 forks source link

Support O(100) cities #326

Open dabreegster opened 4 years ago

dabreegster commented 4 years ago

We've been adding a few other cities slowly and with considerable effort. What would it take to maintain a few hundred?

Downloading optional content

Storing the maps

Map config

Maintaining the maps

michaelkirk commented 4 years ago

Do the released versions include the bundled maps? Or is it expected that the user/application will download maps separately?

Just wondering if we need to leave archived versions of the maps around for old app versions.

dabreegster commented 4 years ago

Do the released versions include the bundled maps?

Yes, but only for the few "curated" maps, aka Seattle and maybe one or two more. To make the initial install experience even quicker, it could be worth removing a few maps from that too.

Just wondering if we need to leave archived versions of the maps around for old app versions.

I'm hesitant to keep more than a few old versions around, just for storage/price reasons

matkoniecz commented 4 years ago

What would it take to maintain a few hundred?

Also many cities would require new features to work as expected - trams in Kraków, aerialways used as major public transport in some places, movable bridges are important in other, congestion pricing in London and so on.

I would expect that adding 100 cities would require adding say 25 major features requiring as much work as tram support triggered by Kraków map. Alternative would be to avoid cities with special features or have them in a half-broken state.

Map config

Is it planned to add just city centers or multiple regions as Seattle? In both cases some tool to easily create boundaries would be nice (by dragging nodes over map, displayed on a website)

Traffic model

Every single new map will also reveal blatant problems in the traffic data model. (currently any map will do this, but even if it will get improved to work decently on some maps then any new map will revel hilarious mismatch with reality)

dabreegster commented 4 years ago

I'll mention that I'm not planning to prioritize this work anytime soon; I just wanted to write down some of the ideas.

Alternative would be to avoid cities with special features or have them in a half-broken state.

In the short term, the goal is just to get them started in a partly broken state. Ideally more people would become interested in the project and help implement the new features.

Is it planned to add just city centers or multiple regions as Seattle?

Just city centers to start, or maybe the entire region, as defined by the bbike.org extract. But I'd like all cities to split into multiple regions like Seattle, and I think it's best done by somebody familiar with that place. geojson.io lets you draw multiple polygons and already has a full world map, so I think it'll suffice for now. There is "internal dev tools > edit a polygon" in the game, but it requires starting with the larger region, and is more useful for fine-tuning the boundaries after they're initially drawn.

Every single new map will also reveal blatant problems in the traffic data model.

In the prolet robot travel demand, you mean? Definitely. This is an opportunity to find lots of problems with it quickly, which will hopefully shape its development better.

natrius commented 3 years ago

What about using Nextcloud instead of Dropbox for the files? There are free hosted versions around, even paid or free as well. Here you can choose or look at options https://nextcloud.com/providers/

matkoniecz commented 3 years ago

There are free hosted versions around

"2GB of free storage" is not too useful, in general free file hosting is useful only when you are extremely unwilling to pay or want to text files or something similarly lightweight

natrius commented 3 years ago

Dropbox 2GB when using free. 2.000GB when paying 10 Euros. When using https://cloud.tab.digital its 8GB for free and 128GB for 5 Euro per month. That was not the point, there are multiple providers and it is possible to choose. I was just suggesting it for the Dropbox daemon crashes constantly when uploading lots of new files this. It may be worth to at least try it.

Maybe seafile would be better as well, as just filesync is needed and nothing else from the feature-set of nextcloud. Here is an provider https://luckycloud.de/en/preise-cloud-speicher-und-funktionen

I'm suggesting options and it seems to me you try to nitpick with comparing a paid product from Dropbox with one specific free product from an Nextcloud-Provider?

EDIT: Just to clarifiy to @dabreegster clubtab is using Nextcloud. Nextcloud is a service you are able to host on your own server. If you need an account for a short test i'm willing to create an account on my server, but i guess a free on cloud.tab is less hassle :D Seafile is a service on its own, also possible to host on your own server. Or use an provider doing it for you, like luckycloud. :) Thanks for your answer.

dabreegster commented 3 years ago

Thanks for the pointer to nextcloud, cloudtab, seafile, etc! I'll take a closer look when I start working on this. Price isn't a strong factor, as long as it's reasonable. Biggest priorities are to sync files easily and be able to construct URLs without having to keep an extra mapping. The project uses Dropbox now simply because I already had an account for other reasons. :P

dabreegster commented 3 years ago

Starting to play around with the process for generating loads of maps. I want to try out https://taskfile.dev and some others for job management, but at the moment, gnu parallel is working fine to import lots of files 4 at a time, with separated log files: for x in ~/bbike_extracts/*; do echo "./import.sh --oneshot=$x --skip_ch >basename $x.log 2>&1"; done | parallel --bar -j4

dabreegster commented 3 years ago

Some of this work is coming together (mainly motivated by having more maps for OSM Connect 2020), but I still don't know how to organize things. Thinking through this again...

What does the end state look like?

So where do files need to wind up?

Storage size concerns:

Maintaining all of the maps:

michaelkirk commented 3 years ago

Storage size concerns:

A note - if you wanted just a bit more wiggle room, the map files tend to compress to about 1/3 their original size.

-rw-rw-r--  1 mkirk  staff    43M Sep 21 14:21 ballard.bin
-rw-r--r--  1 mkirk  staff    16M Oct 27 16:05 ballard.bin.gz
-rw-rw-r--  1 mkirk  staff    23M Sep 21 14:30 downtown.bin
-rw-r--r--  1 mkirk  staff   8.0M Oct 27 16:05 downtown.bin.gz
-rw-r--r--  1 mkirk  staff   247M Sep 21 14:26 huge_seattle.bin
-rw-r--r--  1 mkirk  staff    90M Oct 27 16:05 huge_seattle.bin.gz
-rw-rw-r--  1 mkirk  staff    20M Sep 21 14:26 lakeslice.bin
-rw-r--r--  1 mkirk  staff   7.1M Oct 27 16:05 lakeslice.bin.gz
-rw-r--r--  1 mkirk  staff    57M Sep 22 14:56 los_angeles_midwest.bin
-rw-r--r--  1 mkirk  staff    21M Oct 27 16:05 los_angeles_midwest.bin.gz
-rw-rw-r--  1 mkirk  staff   3.5M Sep 21 14:27 montlake.bin
-rw-r--r--  1 mkirk  staff   1.2M Oct 27 16:05 montlake.bin.gz
-rw-rw-r--  1 mkirk  staff    53M Sep 21 14:37 south_seattle.bin
-rw-r--r--  1 mkirk  staff    19M Oct 27 16:05 south_seattle.bin.gz
-rw-rw-r--  1 mkirk  staff   9.5M Sep 21 14:27 udistrict.bin
-rw-r--r--  1 mkirk  staff   3.3M Oct 27 16:05 udistrict.bin.gz
-rw-rw-r--  1 mkirk  staff    47M Sep 21 14:28 west_seattle.bin
-rw-r--r--  1 mkirk  staff    17M Oct 27 16:05 west_seattle.bin.gz
dabreegster commented 3 years ago

A note - if you wanted just a bit more wiggle room, the map files tend to compress to about 1/3 their original size.

Good point! The files are currently stored compressed in Dropbox, and the updater manages the transformation. But the S3 files for web aren't stored compressed. I will try some experiments to do on-the-fly decompression when the web client loads a file.

dabreegster commented 3 years ago

http://abstreet.s3-website.us-east-2.amazonaws.com/osm_demo/ is live with 123 maps. Gzipping on S3 is a huge help; 3GB instead of 8. A few simple next steps:

Then that paves the way for the native version to allow downloading extra cities.

dabreegster commented 3 years ago

I think it's time to revisit directory structure.

1) A flat list of data/system/maps fails as soon as two regions have "downtown" or something like that 2) I'm not sure how much hierarchy the UI or filesystem should expose for sorting cities -- na/usa/wa/seattle/downtown? europe/uk/leeds/center? It'll also be a question of extracting this hierarchy from OSM or somewhere else, although maybe there can be a little bit of manual mapping that happens. 3) The updater tool has a vaguely structured mapping from files to city, to figure out where optional data belongs. Similar to the data/input/$city structure, I think it may be time to revisit the concept of data/system/$city for organizing maps, scenarios, prebaked results, etc.

dabreegster commented 3 years ago

Wound up surging to >30 cities as part of the actdev work. Feels like it's time to add another layer of namespacing to map names -- two letter country codes. Here's a quick list of stuff I need to account for...

dabreegster commented 3 years ago

Alright, the grand renaming is done. I'll make the city picker nicer tomorrowish.

dabreegster commented 3 years ago

Some recent hardware failures have spurred me to think about moving the map importing process into the cloud again. data/regen.sh on my now dead machine took at least an hour, but the process could at least be parallelized by city. What would the development workflow look like?

1) Locally, work on new changes to the map importer. Run the importer and test locally, as usual. 2) When it's time to regenerate the world, package up the importer binary with the local changes somehow -- maybe in a git branch, or a temporary docker image that directly copies in the local Linux binary. 3) Tell some cloud service to go run one job per configured city. 4) The input for that per-city job is at least the clipped .osm file, so probably it needs to run the updater first for just the city it's working on. 5) The output should go in S3, in some temporary named version that can later be renamed to the dev version. 6) That cloud service has some kind of web or CLI UI to track job progress and view STDOUT/STDERR for jobs. 7) After the jobs are all done, the developer runs another script to pull down only some of the changed files -- Seattle and the few other cities that have screenshot testing or prebaked results. Manually run those other tools, producing a bit more data. 8) Once everything's confirmed, need to merge all of the S3 directories into one nice dev version. Also need to produce a merged data/MANIFEST.txt file and commit it somehow. 9) Push the git commit. Done!

As a sort of interim solution for step 8, I can download all the changed files and produce the manifest locally; my downstream is fast, but upstream is still bad. A better solution long-term is probably to split the manifest file into per-city and have a better abstraction in the code for reading/merging all of them.

From a cursory glance, AWS batch looks like a reasonable fit for the cloud service, since it can run Docker images, does some output redirection and logging by default, has a web UI, and at least has configuration for balancing speed/cost.

dabreegster commented 2 years ago

Most recent mass reimport was painful operationally. I have a small improvement I want to try:

1) Use the existing per-city jobs in regenerate_everything, but physically run a separate process per city 2) Use https://github.com/Nukesor/pueue to run cities in parallel and get nice split logs and overall tracking. In that way, it dodges some of the issues of #262

dabreegster commented 2 years ago

pueue works well enough, but I was hoping for a nicer summary counting jobs by status. Either way, few minutes work means now I can melt my laptop super fast. In 8 minutes of parallelization, I can fully regenerate 70 of 88 cities. A considerable win for my workflow. There are also per-city logs, and one city failing won't break everything down.

dabreegster commented 2 years ago

Now there are two long tails! Screenshot from 2022-02-10 16-13-34 Parallelizing Seattle is hard because of weird dependencies between huge_seattle and the rest, and keeping that map in memory. But possible next step, could parallelize London by each borough map.