Expand bicycle and transit lookups

EwoutH commented 2 weeks ago

Expand bicycle and transit lookups are now from a very small 20-area mrdh65 resolution. While for the outer area that's fine, the inner area should have a bit higher resolution to properly be able to say something about a city center. Preferably pc4.

Limit is Google Maps API costs.

EwoutH commented 1 week ago

Let's describe the problem properly.

Currently, the Google Maps APIs has collected travel times between the population-weighted centroids of each mrdh65 region. This basically went as followes:

The mrdh65 regions with a centroid (red) in the city polygon (light grey) have been marked
pc4 (viercijferig postcode) gebieden were assigned to each mrdh65 area. If they bordered more than one, it got assigned to the mrdh65 region with the largest overlap.
Weighted centroids (blue) where calculated, based on the centroid and population of each pc4 area in the mrdh65 region.
All mrdh65 regions with centroids in the city polygon have been selected.

This resulted in 20 weighted centroids, between each the travel time was looked up with the Google Maps Distance Matrix API for both cycling and transit travel time (on a weekday morning). This resulted in 19x20x2 = 760 lookups, around ~$4. The travel times can be seen below:

Now, it would be interesting to and/or:

Have a broader reach, to factor in traffic coming in from the city edges
Have more resolution in the center, where cycling traffic might scale way other.

There's a $200 monthly free budget available, which means 40.000 lookups can be done, or 20.000 per mode considering both cycling and transit. A matrix of 141x141 should be possible, which:

could cover the full mrdh 65x65 matrix (makes not much sense for cycling)
could cover the 69 pc4 areas currently allocated to agents (in the 13 mrdh65 areas agents currently reside in).

The latter seems the best option, since it increases cycling and transit resolution, while not further expanding the scope of the project.

To do this robustly, a pc4 lookup could be added, which is tried first, with a fallback to the existing mrdh65 lookup.

EwoutH commented 1 week ago

There was an indexing issue on the travel time lookups, that was incorrectly fixed with some aggressive data filtering and an incorrect bugfix. That's now resolved (in b8995158235afab01854e7a9894e2d12ced3c836, aafc5c7e297c822e954587bbd6db5aa31eee2f1c and 483f797831c957895d38ff77ff747776341faf44), so now we're actually using 118 pc4 areas over 21 mrdh65 areas.

That's just under our theoretical limit of 141x141 lookups, but with adding the other areas correctly back we already got a "relative" resolution increase.

A small shift towards bikes and transit can be noticed, notably taking a large part of the av share.

# Before fixing lookups + expanded area
Mode shares: ['car: 15.88%', 'bike: 72.96%', 'transit: 7.93%', 'av: 3.23%']
Hour 7: Mode shares: ['car: 17.08%', 'bike: 70.96%', 'transit: 8.09%', 'av: 3.87%']
Hour 8: Mode shares: ['car: 15.20%', 'bike: 74.10%', 'transit: 7.84%', 'av: 2.86%']

# After
Mode shares: ['car: 11.60%', 'bike: 78.81%', 'transit: 9.35%', 'av: 0.24%']
Hour 7: Mode shares: ['car: 13.05%', 'bike: 77.50%', 'transit: 9.17%', 'av: 0.27%']
Hour 8: Mode shares: ['car: 10.76%', 'bike: 79.57%', 'transit: 9.46%', 'av: 0.22%']

EwoutH commented 1 day ago

Some initial journey data:

journeys_data

Note how the "car" curves are very smooth, but the transit and bike modes aren't. This is due to this issue, of not having enough spatial resolution to properly estimate distance and travel time from the Google Maps API lookup tables.

So I was considering how to fix this with minimal effort. Here I plotted the travel speed for all connections from the Google Maps lookup tables:

travel_speed

Note how transit is widely spread, while cycle is quite narrow. This maybe allows just assuming a fixed speed for cyclists, since route doesn't seem to matter that much, and use the same car distances we already have in the network.

So for now I see three options:

Calculate the average cycle speed in a city (according to Google) and multiply that with the distances from the network. This will enhance precision from node-level to street level, which is more than enough. a. Note that small roads, bicycle paths and shortcuts aren't included in the road network for cars, so some trips are taking disproportionally long.
Create a very detailed cycle network and compute the distances once and put those into lookup tables. Relatively trivial.
Use more Google lookups. Probably the way to go, only everything needs to be remapped from mrdh65 area codes to pc4 (4 digit postal code) ones. Needs to be done anyways. Double check API prices, test small and don't fuck up the big run.

@quaquel bit tired, but I'm going to try to ask an coherent question about this tomorrow. Basically all other stuff is done model wise, now it's experimental design and how to aggerate data nicely. Mode choice is a bit oversimplified but quite happy about everything else. I will mail an detailed update with some proper questions tomorrow.

EwoutH commented 10 hours ago

An old issue where not all nodes had a distance value to each other really became a big factor when moving from mrdh65 to pc4 regions. This was fixed in https://github.com/EwoutH/urban-self-driving-effects/commit/d394ca15f8b733d5bacaacaf4d3f5f010aae6e03.

Edit: 301b48390e760f59c2530bc3d2460f31b7490a48 will further help an easy migration from mrdh65 to pc4.

We're going with option 3.

EwoutH / urban-self-driving-effects

Expand bicycle and transit lookups #5