154 introduced the "proletariat robot" method for generating input to the traffic simulation. The more general approach extends to more activities besides going home and to work. Yesterday at the Amazon Hack for Good event, a small team took a first stab at this. This issue organizes the remaining work. The code lives here.

[ ] Automate the downloading and processing of US census data from https://www.census.gov/geographies/mapping-files/time-series/geo/kml-cartographic-boundary-files.html and http://census.ire.org/data/bulkdata.html
[ ] When US census data isn't available, guess number of residents using the OSM-tags-and-building-area approach from the P.R. model
[ ] Research other standard census sources outside of the US
[ ] Find and adapt a set of person "profiles" like college student, primary school student, worker, caretaker, etc, along with their typical schedule of activities
[ ] Continue improving the mapping between OSM amenities and activities
[ ] Figure out how to compare the results from our approach with something well-tuned like the Soundcast model. Since the number of people living in each building should roughly match up because we're using similar land use / census data, maybe quantify how "different" two generated people are in their schedule.
[ ] Make a UI to tune parameters easily. Players would choose "auto-generated from census" from the traffic dropdown, get some reasonable default, but then have an edit button where they can understand how the model works, and control different parts of it interactively.
[ ] Add a step to the pipeline to generate more off-map trips, particularly between two ends of a highway passing through the map.
[ ] Write up the approach clearly, then communicate it to other groups building activity models, and get feedback on our approach.
[ ] Clean up the prolet robot generator in favor of the generalization

The prolet robot model idea is described here. The generalized pipeline works like this:

1) For a map of lower Manhattan, find census data that says how many people live in each census tract 2) Find all residential buildings inside each tract. Generate individual people, randomly assigning homes and demographics based on the aggregate stats 2) Classify each person as a student, worker, caretaker, etc 3) Randomly generate a schedule for each profile. (Wake up at 7, go get breakfast for 30 mins, go to school for 5 hours, …) 4) Pick specific universities / cafes / etc for each activity

Which cafe will somebody go to?
Will they drive, bike,walk, or take public transit?

Foreword: apologies for the extensive comment, I've had a cursory look through some of the other issues, if some parts of this would be better copy-pasted over on another one, let me know

This is Github Gist of the Jupyter Notebook for the process that's explained below

This issue seems a little inactive so it seems that other ones have taken priority at the moment but nevertheless, @dabreegster and I spoke on a Reddit thread a while ago about generating sensible destinations efficiently and input models and the like, and he suggested I drop in with what I have.

I've been working on an epidemiological model which integrates mobility patterns. In order to do so I need to generate an input model and there seems to be a somewhat non-insignificant amount of overlap with trying to design input maps (and activity patterns) for AB-Street.

A lot of it shouldn't be hard to recreate directly in Rust/language of choice, but as usual the Python ecosystem makes some things really easy. I don't know how much (if any) of it would be useful but there are some basic steps that might be an idea for generating populations and activity patterns and hopefully it can inspire some discussion here.

There are four main steps:

Data Acquisition

OSM Data

This is something you guys are all too familiar with, the notebook uses Pyrosm for automated retrieval of data rather than the Overpass API which is quite a nice utility. I've talked about it on a different thread but I don't know of something equally nice in Rust yet. A nice feature of it, is that it allows a user to specify geocodes, a local boundary in English, rather than needing a bounding geometry (although they are supported as well), something like "New York" or "Westminster, London" From that module, I query OSM for residential buildings, and for what I deem to be viable workplaces.
- Residential Buildings This relies on the following filter: "building": ["residential", "apartments", "flats", "house", "yes"] but I think that needs substantial tweaking. Not sure if you have a better system in place. I am considering trying to find an existing dataset (hopefully API) for building footprints as in my experience so far, there are an awful lot of missing buildings in less-populated areas. Even in an area like Greater Manchester (second-most-populous) in the UK.
- Office Buildings / Locations This is the one I'm least certain about currently, I use a mixture of pois with {"shop": True,"amenity": True}, data with {"office": True}, and buildings with {"building": ["office", "offices"]}. This returns a hodge-podge of data, but my thoughts are that basically all shops, and most amenities, are workplaces, and that should be joined with the other classifications of 'office' that exist in OSM in various forms. There's probably some overlap, and you get a mixture of datapoints and geometries back (which I deal with by just using centroids instead of geometry data at all, which might be a problem for AB street) which is unfortunate. The biggest problem though I'll get back to further down.
Population Data

I use WorldPop to get a rasterised (i.e. pixelated image) dataset of estimated population count in 100m^2 tiles of the earth. This means that you don't have to faff about with census data and try and estimate population in areas. Currently I do this by downloading the file directly but as it states in the Jupyter-notebook this could be automated through their API I believe.

This could be used in conjunction with input parameters, like unemployment rate, average age, etc. meaning the census data acquisition problem could be left to later, or even just left to the user, allowing them to tweak params and see the end results.

Cleanup

Coming back to some of the problems I mentioned, OSM data is classically muddy.

You need to make sure that all incoming datasets are in equivalent projections, reprojecting population count rasters will mess up the total, so try and work in whatever co-ordinate system your population counts come from.
According to workplace size estimates from the ONS in the UK, I've actually severely overestimated the amount of workplaces in my dataset. This needs further exploration but if you're interested in getting realistically sized workplaces (in terms of employees) then getting the right set of places in OSM is going to prove finicky. I don't have a solution for this currently, besides just not allocating most workplaces (you could do the opposite and allocate all workplaces and end up with smaller workplaces than data suggests but I digress)
As I mentioned, the workplaces from OpenStreetMap come in a variety of geometries, some are just tagged as points, some are tagged to appropriate buildings. If you're just trying to generate traffic OD pairs, then this might not actually be a problem. I ignore building footprints in my actual use-case so I convert all of the geometries to singular points.
Residential building size estimates are mostly useless in my experience, I completely ignore any OSM data that's provided such as number of floors etc. I try and use the classification of 'apartments' or 'flats' but mostly build building sizes off population counts at that tile. (Missing buildings will really screw this, you'll get huge skyscrapers in populous tiles with missing building data)

Household Allocation

This boils down to a relatively simple process, the implementation in the notebook is very inefficient (especially because of how it uses some of Python, it would be way faster with a different implementation and way faster in another language). But I was still able to run it for a large portion of London with about 7.5 million people

Find all residential buildings within a radius of the centres of each tile/pixel in the population raster. (be careful with projections, distance doesn't work well with (lat, lon).

for each tile with population count k:
   for _ in range(k):
      create a person
      find a residence with capacity within that radius
      If there are no residences with capacity, upgrade the residence type (e.g from a house to a flat) 
      allocate them

Destination Allocation (Currently only Workplaces)

This is a trimmed version of my current allocation process which takes out the use of a Public Transit graph during selection so that commuting distances aren't just picked from a spatial radius (but instead take the graph into consideration). Unlike the Household Allocation process, this is very efficient, running in seconds in most cases, although changing the bucket size can improve accuracy for a trade-off in speed. The result of this is just Origin-Destination pairs for people, it doesn't save the routing methods so it seems like a good approach for AB-Street where you can then utilise your contraction hierarchies and routing, etc.

If the public transit thing is of interest I can make another notebook in a separate gist outlining that but it's not as automated as acquiring the graph is quite a unique process. I hope that this can be automated with GTFS feeds later on. In principle that works by generating an all-pairs collection using Djikstras, with a cutoff. It could work with a road network, so it might still be useful to this project, but the all-pairs approach would become pretty expensive quickly.

In terms of procedure this works somewhat as follows

Spatially enumerate the region, grouping unemployed (i.e. as of yet unallocated) people within a tile/bucket into a list
Compute a 2D image of the same dimensions as the above, but instead of lists have the value be the length of each list, therefore telling you the number of unemployed people at each tile.
Create a function for a generator to efficiently query unemployed people in a radius around a point, population sizes can grow too big for KD-Trees, which also don't deal well with insertion/deletion, and other similar approaches also grow too big. The generator will sample across the buckets based on their size (i.e. how many unemployed people are in each bucket) and pick a cache of samples to batch the work. Once the cache has ran out, or if it starts missing, it'll regenerate the cache and recalculate the probabilities to account for allocated people.

for each workplace in shuffle(workplaces):
  create generators for selecting unemployed people in walking, cycling, and driving radii around the workplace location
  for each person in the workplace capacity to fill up:
    pick a transport option (i.e. walking cycling or driving):
      get an unemployed person for that option, pick a new one if it fails
    allocate that unemployed person to that workplace

This could very easily be applied to POIs and restaurants and such to generate pre-computed lists of agent's favourite destinations. Using the bucketed approach is actually very fast though and I think in Rust with some vectorisation or change in implementation you could maybe even do this in real time (because only a small subset of agents will be selecting a new destination according to their schedule at any given time).

Wow, thank you so much for the write-up! We've indeed deprioritized this project for the moment, but when we pick it up again, there are some strong leads here -- especially WorldPop and the different paradigm for generating workers.

One of the reasons we've put this on hold is that there are other activity model generators that we've been able to import data from, like grid2demand and @Robinlovelace's R packages. I wonder if we might import data from your pipeline as well? Looking at the notebook, it looks like the output is a list of people with a home and work location and mode. We could create an abst scenario just by picking a reasonable departure time, and maybe adding a return trip back home.

The household allocation procedure you use seems more clever than ours: https://github.com/a-b-street/abstreet/blob/master/popdat/src/distribute_people.rs#L35 We find some polygons representing census tracts with the total number of people living there, then we find all residential buildings in OSM (using https://github.com/a-b-street/abstreet/blob/8e04cada2ed94543ac7daef9980b047cbd62c3be/map_model/src/make/buildings.rs#L154 to classify them, written by @matkoniecz). Then it's a matter of distributing the total population among the buildings. The big problem is that even if we guess a building has higher capacity, this randomization doesn't place more people in those buildings yet.

I wonder if we might import data from your pipeline as well?

Feel free to do whatever you want with that gist, I've added a comment underneath clarifying that it's under an MIT License so I don't care how it's used. I won't be hosting it anywhere I don't think so if you do want to use it (even if just to precompute datasets to make available or as a downloadable tool rather than something interactive for the user) then I'll leave you to decide how best to do so.

I think the household allocation makes a fair bit of sense, depending on use-case, for something like AB-Street it should be more than good enough in my eyes, the main issues I have with it right now is missing OSM data causing unrealistic buildings but for something where you're more interested in traffic density originating from those areas I don't see it being a big problem.

(That said, if you guys have any solutions or ideas on how to deal with missing building geometry data in OSM I'd be keen to hear. My guess is that for the US you won't see many problems but any users wanting to import their local European cities might struggle as we seem to have much worse data over here.)

Feel free to do whatever you want with that gist

Thanks! When we pick up this work more seriously, we might be able to incorporate some ideas from it.

how to deal with missing building geometry data in OSM

Ahh, I actually have something for you. Check out all these empty roads around Bailrigg: https://www.openstreetmap.org/#map=16/54.0591/-2.8161

Download this file and upload it to geojson.io to view: bailrigg_procgen.json.txt

Screenshot from 2021-03-03 09-07-24

We're doing simple population generation for https://github.com/cyipt/actdev and hit the missing house problem too. Simple procedural generation works as a stopgap. It finds all highway=residential road segments that have no buildings snapped to them (according to A/B Street's simple model). It walks along both sides of the road and tries to place a rectangle with sides between 6 and 14 meters about 10m back from the road. If it overlaps any other building, road, or water/park area, it skips it.

The code's at https://github.com/a-b-street/abstreet/blob/master/importer/src/bin/generate_houses.rs -- it's pretty simple, but repeating this logic from scratch on raw .osm might be tough, because you'd have to first solve problems like calculating wide roads and extracting park/water areas. If you had a ~small number of places where you want to run this, we could import them into A/B Street and run the tool. Or I could show you how to do that running locally.

That's a sweet process! Procedural gen might be a fine compromise, I'd been thinking about it so it's nice to see an existing implementation. My worry is it won't work well in non-population-dense areas (which isn't as much of a concern for AB-Street so makes sense). You don't want hundreds of houses along a country road for example.

I'm going to spend some time looking for building footprint datasets before going to that, I know Microsoft has released some for the whole of the US and Australia https://www.microsoft.com/en-us/maps/building-footprints

There are also ML models trained to get this stuff from satellite imagery but I haven't found an existing API as of yet, I'm suspecting it wouldn't be cheap to run.

a-b-street / abstreet

Activity modelling #424

Data Acquisition

Cleanup

Household Allocation

Destination Allocation (Currently only Workplaces)