Travel time matrices for assigning activities to zones

Hussein-Mahfouz commented 7 months ago

The NTS data we are using only assigns individual activities to regions (e.g. "West Yorkshire", "North West"). We only have the home location (from the SPC) but we need to be able to determine feasible activity locations.

For example, to determine the location of an education facility, we use home location (spc), mode of travel (nts), reported travel time (nts), and travel time matrices by mode to identify which zones the education facility could be in. (current function here)

I am using travel time matrices (at OA level) that I have calculated from another project, but it would be useful to have a pipeline to create these matrices for any study area.

dabreegster commented 7 months ago

Some questions...

What's the performance for the current travel time matrices approach look like, either for building the matrices or querying them? Does either feel like a limiting factor?
How detailed do modes of travel get -- just "cycling" or "cycling with e-bike so hills don't matter, and confident about stressful roads"? For PT, limits on money for tickets?
What's your source of network and destination data right now -- both OSM?

dabreegster commented 7 months ago

If you're using travel time matrices, there are about 180k OAs, so that's around 32 billion entries per mode. Very conservatively assuming 8 bytes per entry, that's around 241GB for one mode's matrix. Seems quite extreme and wasteful, given so many OAs don't interact.

How high does travel time usually go -- not often over 2 hours, hopefully? Do you want to find all destinations within 2 hours of a start point, or stop when you find the closest one, or make some randomized decision about whether to keep searching as you encounter each one? Or since the travel time is from a survey, maybe ignore destinations closer and really insist on some that're about X minutes way?

Hussein-Mahfouz commented 7 months ago

What's the performance for the current travel time matrices approach look like, either for building the matrices or querying them? Does either feel like a limiting factor?

I am using r5r and I am building a matrix for a specific city. In my case it was Leeds (~2600 OAs) and I was creating a matrix for each of car, walk, cycle, and 5 matrices for bus (morning_wkday, afternoon_wkday, evening_wkday, night_wkday, morning_wkend, night_wkend). I'm doing this on my laptop. It is very fast for PT (maybe 30 seconds per matrix) but very slow for car trips (could take 30 minutes for the same matrix). I think the difference in performance is because r5 was built for pt routing. The routing engine takes another 30 seconds to start running

The code for the routing wrappers is here and the code for running r5r is here

How detailed do modes of travel get -- just "cycling" or "cycling with e-bike so hills don't matter, and confident about stressful roads"? For PT, limits on money for tickets?

These are all the options you can pass (r5r::travel_time_matrix()). For hills, you can add an elevation file in the setup. If it's an ebike, you can ignore the elevation and/or change the bike_speed parameter. For PT, you can add monetary limits through max_fare, but I haven't done that since you would need to add a fare_structure file to your gtfs feed. See this vignette for more details

What's your source of network and destination data right now -- both OSM?

network: osm
destination zones: OA zones from census
destinations: OSM (through osmox) (some notes in #19)

Hussein-Mahfouz commented 7 months ago

If you're using travel time matrices, there are about 180k OAs, so that's around 32 billion entries per mode. Very conservatively assuming 8 bytes per entry, that's around 241GB for one mode's matrix. Seems quite extreme and wasteful, given so many OAs don't interact.

Yeah I'm definitely not running this on a national level. I'm currently constraining it to OAs within a specific city, and limiting the travel time to 2 hours.

How high does travel time usually go -- not often over 2 hours, hopefully?

I need to check the NTS to see the travel time distribution. One option could be a design decision to only include intracity trips, and limit the time to 2 hours

Do you want to find all destinations within 2 hours of a start point, or stop when you find the closest one, or make some randomized decision about whether to keep searching as you encounter each one? Or since the travel time is from a survey, maybe ignore destinations closer and really insist on some that're about X minutes way?

There are normally a bunch of different people in each OA, and each one will have a different travel distance from the NTS, so I don't think we could insist on a travel time in the routing phase. It makes sense to me to create the matrix, and for each individual, use the matrix to determine the zones they can reach given the specified travel time from the NTS. This is what I was doing in this function

Hussein-Mahfouz commented 2 months ago

@sgreenbury this is the workflow for creating travel time matrices:

1. Getting the data

OSM Road Network:
- Option 1: download manually through through geofabrik
- Option 2 (preferred): download using a script/cli (e.g. using pyrosm). Sam's message here is a good starting point for that. It would also be useful for POI data, as shown in the issue where the message is from
GTFS feeds:
- GTFS data for the UK can be found under timetables here. You need to create an account to download
Zoning layer: we arecalculating travel times for an OD matrix. I normally use zone centroids as Origins and Destinations. We are currently using OA21CD boundary layer for the UK. Ideally the code should be agnostic to the layer you provide it (OAs, MSOAs, custom zoning layer)

2. Preprocessing the data:

GTFS feed dates: a routing engine normally calculates travel times for a specified date time which the user specifies. For public transport, if that date time does not exist in the GTFS feed, you will get no results. You need to do the following:
- Explore the feed to see which dates it covers. If you have more than one feed (e.g. bus and rail), you need to find a date which overlaps for both of them. This script explores GTFS feeds and plots service patterns to compare overlap https://github.com/Hussein-Mahfouz/drt-potential/blob/main/code/compare_gtfs_overlap.R
- Edit the dates manually: If you have two feeds (e.g. bus and rail) and the dates do not overlap, you could edit the date range of one to match the other. Here is a script to do that: https://github.com/transportforcairo/wri-numo_access-analysis/blob/main/code/1_Preprocessing/0_edit_gtfs_calendar.R
Clipping GTFS feed to study area: if you have a feed for a whole country (like BODS data) but you only want to study a specific area, you should clip the feed.
- Implementation here: https://github.com/Hussein-Mahfouz/drt-potential/blob/main/code/clip_gtfs.R
- See filter_ functions in the [gtfstools package (https://ipeagit.github.io/gtfstools/reference/index.html) for the different options that can be useful
Convert data to GTFS: UK rail data is not available in GTFS format. If you want to include rail in the travel time calculations, you need to transform it
- Data source: https://data.atoc.org/how-to
- Script for conversion: https://github.com/Hussein-Mahfouz/drt-potential/blob/main/code/rail_to_gtfs.R
- Useful github issue on the topic: https://github.com/Hussein-Mahfouz/drt-potential/issues/7
Get GTFS in correct format for routing engine: This is a not strictly necessary, but is useful for dealing with uncertainty in routing engine travel time estimates (i.e. the start time you specify is arbitrary, and can have an impact for buses that show up very infrequently - if I leave at 8: I just get it, vs if I leave 8: 15 I have to wait for an hour). See here: https://github.com/Hussein-Mahfouz/drt-potential/issues/11

2. Routing:

Routing: I have done this in r5r (we may want to use r5py for better integration with the rest of the code). r5 can create a travel time matrix for a specified combination of modes (see the travel_time_matrix api). If you want to calculate for different mode scenarios, you need to run the function multiple times. I did the following:
- Step 1: Create a wrapper function so you can run the travel time matrix multiple times with different configurations, and store the results for each: https://github.com/Hussein-Mahfouz/drt-potential/blob/main/R/r5r_routing_wrappers.R
- Step 2: have a script to run the results: I specified scenarios (each a different mode combination) like this

Hussein-Mahfouz commented 1 month ago

maybe useful: https://github.com/arup-group/gtfs_skims/tree/main

Urban-Analytics-Technology-Platform / acbm

Travel time matrices for assigning activities to zones #20