Hussein-Mahfouz / drt-potential

3 stars 0 forks source link

Handling uncertainty in travel time calculations #11

Closed Hussein-Mahfouz closed 8 months ago

Hussein-Mahfouz commented 10 months ago

How to address travel time uncertainty when calculating travel time matrices? For background, see:

The time_window parameter (in combination with the percentile parameter) is ideal, but it can only be used with frequency-based gtfs feeds. From the vignette:

Please keep in mind that the time_window only affects the results when the GTFS feeds contain a frequencies.txt table.

Solution using time_window parameter in r5r

One solution is to create a function to convert stop_times to frequency, and use that to edit the gtfs feeds so that they are frequency based feeds

See my comment https://github.com/ipeaGIT/gtfstools/issues/69#issuecomment-1693191738 for getting started on the function, and https://github.com/ipeaGIT/r5r/issues/282#issuecomment-1693179215 to understand how r5 handles the time_window argument when you are using a gtfs feed without a frequencies.txt file

Hacky manual solution

We can pass different departure times to the travel_time_matrix function (e.g for 8:00am, use 7:55, 8:00, 8:05). This is a hacky way of recreating the time_window functionality, and it will definitely be lot slower

Hussein-Mahfouz commented 10 months ago

I am trying to create a frequencies.txt file so that the routing can use the time_window() parameter.

I tried to use the get_route_frequency() function in tidytransit, but it depends on having a direction_id column in the trips.txt file. This is an optional column in the gtfs feed, and is not present in BODS data

I tried to create the column by grouping trips by route_id and service_id, with the expectation that there should be two trips in each group, and I can give them 0 / 1 values, but turns out there are routes with more than 2 trips:

image

I tried to plot these trips to see how they are different. Here is a facet plot (by trip_id):

image

It looks like 2 are the same (they even have the same stop sequence not opposite which seems wrong to me). The other 3 are all different

Based on these results, I think I should treat each trip separately if I were to calculate frequencies from stop_times (and ignore the route level logic used in get_route_frequency() ). This is more in line with the gtfs frequencies.txt, which has the following columns: trip_id | start_time | end_time | headway_secs

Robinlovelace commented 10 months ago

GTFS datasets and official timetables are notoriously out in Leeds and presumably beyond. No comments on this other than: great you're considering this and that there are already some implementations.. One question: is system reliability/uncertainty measured? Not my area, fascinated to learn of methods + eventually results.

Hussein-Mahfouz commented 10 months ago

Reliability

I read a bit of literature on system reliability, and listened to a nice episode about it with Niels van Oort. I've seen some analysis on actual vs scheduled services. You could probably use the live bus location api to compare scheduled services to what actually ran. Reliability is a whole area of research and I would prefer not to get into it for the first research question as I am not up to date on the literature. Let me know if you have any thoughts about it for this research question or for later on in the research

Uncertainty

The one thing that I think would be useful to use is the percentiles argument in r5r::travel_time_matrix(). From the documentation:

In this case, there isn’t a single estimate of travel time / accessibility, but a distribution of several estimates that reflect the travel time / accessibility uncertainties in the specified time window. To get our heads around so many estimates, we can use the percentiles parameter to specify the percentiles of the distribution we are interested in. For example, if we select the 25th travel time percentile and the results show that the travel time estimate between A and B is 15 minutes, this means that 25% of all trips taken between these points within the specified time window are shorter than 15 minutes.

It's a useful parameter that deals with the uncertainty of matching a very specific departure time with fixed scheduled services. A high percentile (say 75%) could be used.

Robinlovelace commented 10 months ago

percentiles() sounds like a reasonable and simple approach. :+1: to not getting too sidetracked also.

Hussein-Mahfouz commented 8 months ago

stop_times_to_frequencies() is a difficult function to implement.

Hussein-Mahfouz commented 8 months ago

One important thing to note is that the time_window parameter in r5r DOES work with feeds that don't have a frequencies.txt file. here are the results of using the expanded_travel_time_matrix function with a 30 minute time_window

image

For the same departure time, the results are the same for each draw_number. However, if our time_window = 30, we have 30 different departure times for each OD pair, and each one has a different travel_time.

The percentiles argument also works, as shown here:

image

The reason they say frequencies.txt is needed is in order to simulate changes in the start time. That would lead to different draws for the same OD pair having different travel times. For our purposes this is not necessary.

What this means is that a stop_times_to_frequencies() function is not necessary for our purposes