Closed Hussein-Mahfouz closed 8 months ago
I am trying to create a frequencies.txt file so that the routing can use the time_window() parameter.
I tried to use the get_route_frequency() function in tidytransit, but it depends on having a direction_id
column in the trips.txt file. This is an optional column in the gtfs feed, and is not present in BODS data
I tried to create the column by grouping trips by route_id and service_id, with the expectation that there should be two trips in each group, and I can give them 0 / 1 values, but turns out there are routes with more than 2 trips:
I tried to plot these trips to see how they are different. Here is a facet plot (by trip_id):
It looks like 2 are the same (they even have the same stop sequence not opposite which seems wrong to me). The other 3 are all different
Based on these results, I think I should treat each trip separately if I were to calculate frequencies from stop_times (and ignore the route level logic used in get_route_frequency()
). This is more in line with the gtfs frequencies.txt, which has the following columns: trip_id
| start_time
| end_time
| headway_secs
GTFS datasets and official timetables are notoriously out in Leeds and presumably beyond. No comments on this other than: great you're considering this and that there are already some implementations.. One question: is system reliability/uncertainty measured? Not my area, fascinated to learn of methods + eventually results.
I read a bit of literature on system reliability, and listened to a nice episode about it with Niels van Oort. I've seen some analysis on actual vs scheduled services. You could probably use the live bus location api to compare scheduled services to what actually ran. Reliability is a whole area of research and I would prefer not to get into it for the first research question as I am not up to date on the literature. Let me know if you have any thoughts about it for this research question or for later on in the research
The one thing that I think would be useful to use is the percentiles
argument in r5r::travel_time_matrix(). From the documentation:
In this case, there isn’t a single estimate of travel time / accessibility, but a distribution of several estimates that reflect the travel time / accessibility uncertainties in the specified time window. To get our heads around so many estimates, we can use the percentiles parameter to specify the percentiles of the distribution we are interested in. For example, if we select the 25th travel time percentile and the results show that the travel time estimate between A and B is 15 minutes, this means that 25% of all trips taken between these points within the specified time window are shorter than 15 minutes.
It's a useful parameter that deals with the uncertainty of matching a very specific departure time with fixed scheduled services. A high percentile (say 75%) could be used.
percentiles()
sounds like a reasonable and simple approach. :+1: to not getting too sidetracked also.
stop_times_to_frequencies() is a difficult function to implement.
trip_id
| start_time
| end_time
| headway_secs
. headway_secs
?_service_id
parameter. Different service_ids reflect the same trip at different days, so a trip will be repeated multiple times in stop_times.txt. This means our calculated headway_secs is overinflated and innacurate. How do we calculate headway while accounting for different services?:
service_id
+ stop_id_order
(the column we created to identify unique trips)One important thing to note is that the time_window
parameter in r5r DOES work with feeds that don't have a frequencies.txt file. here are the results of using the expanded_travel_time_matrix function with a 30 minute time_window
For the same departure time, the results are the same for each draw_number. However, if our time_window = 30, we have 30 different departure times for each OD pair, and each one has a different travel_time.
The percentiles argument also works, as shown here:
The reason they say frequencies.txt is needed is in order to simulate changes in the start time. That would lead to different draws for the same OD pair having different travel times. For our purposes this is not necessary.
What this means is that a stop_times_to_frequencies()
function is not necessary for our purposes
How to address travel time uncertainty when calculating travel time matrices? For background, see:
The
time_window
parameter (in combination with thepercentile
parameter) is ideal, but it can only be used with frequency-based gtfs feeds. From the vignette:Solution using
time_window
parameter in r5rOne solution is to create a function to convert stop_times to frequency, and use that to edit the gtfs feeds so that they are frequency based feeds
See my comment https://github.com/ipeaGIT/gtfstools/issues/69#issuecomment-1693191738 for getting started on the function, and https://github.com/ipeaGIT/r5r/issues/282#issuecomment-1693179215 to understand how r5 handles the
time_window
argument when you are using a gtfs feed without afrequencies.txt
fileHacky manual solution
We can pass different departure times to the travel_time_matrix function (e.g for 8:00am, use 7:55, 8:00, 8:05). This is a hacky way of recreating the time_window functionality, and it will definitely be lot slower