Closed AlexandraKapp closed 3 years ago
My hope is now that the above commit has fixed, but I'll re-open until confirmed by @AlexandraKapp. Of your examples above, the sole remaining mismatch is this:
library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.149'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$calendar [1, -(1:8)]
#> start_date end_date
#> 1: 20201023 20201212
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
iso <- gtfs_traveltimes (gtfs, from = from, start_time = start_time)
to <- "Berlin, Hegemeisterweg"
iso [grep (to, iso$stop_name), ]
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 31783 00:35:48 2 070101050371 Berlin, Hegemeisterweg 13.52022 52.47401
#> 31784 00:33:48 1 070101050372 Berlin, Hegemeisterweg 13.52022 52.47401
gtfs_route(gtfs, "Berlin Hauptbahnhof", to, start_time = 8*3600)
#> route_name trip_name stop_name
#> 1 S5 S Strausberg Nord S+U Berlin Hauptbahnhof
#> 2 S5 S Strausberg Nord S+U Friedrichstr. Bhf (Berlin)
#> 3 S5 S Strausberg Nord S Hackescher Markt (Berlin)
#> 4 S5 S Strausberg Nord S+U Alexanderplatz Bhf (Berlin)
#> 5 S5 S Strausberg Nord S+U Jannowitzbrücke (Berlin)
#> 6 S5 S Strausberg Nord S Ostbahnhof (Berlin)
#> 7 S3 S Friedrichshagen S+U Warschauer Str. (Berlin)
#> 8 S3 S Friedrichshagen S Ostkreuz Bhf (Berlin)
#> 9 S3 S Friedrichshagen S Karlshorst (Berlin)
#> 10 M17 Berlin, Hasselwerderstr. S Karlshorst (Berlin)
#> 11 M17 Berlin, Hasselwerderstr. Berlin, Hegemeisterweg
#> arrival_time departure_time
#> 1 08:00:30 08:01:12
#> 2 08:03:06 08:03:54
#> 3 08:05:24 08:05:54
#> 4 08:07:06 08:07:54
#> 5 08:09:24 08:09:54
#> 6 08:11:36 08:12:24
#> 7 08:16:54 08:17:24
#> 8 08:19:06 08:19:42
#> 9 08:24:36 08:25:12
#> 10 08:31:00 08:31:00
#> 11 08:35:00 08:35:00
Created on 2020-12-14 by the reprex package (v0.3.0)
And that looks more like the standard gtfs_route()
function not following the minimal-transfer route, while gtfs_isochrone()
does. The two algorithms now work completely differently, so that's perhaps not surprising, and I'd be happy to interpret that as a bug in gtfs_route()
, rather than gtfs_isochrone()
. @AlexandraKapp Would you please run this through your benchmark tests and see what you can find? Thanks!
Clearly have to revise the main gtfs_route()
function anyway, because ...
library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.149'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$calendar [1, -(1:8)]
#> start_date end_date
#> 1: 20201023 20201212
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
to <- "Berlin, Hegemeisterweg"
start_time <- 8 * 3600
traveltimes <- function (gtfs, from, start_time)
gtfs_traveltimes (gtfs, from = from, start_time = start_time)
route <- function (gtfs, from, to, start_time)
gtfs_route(gtfs, from = from, to = to, start_time = start_time)
rbenchmark::benchmark (
traveltimes (gtfs, from, start_time),
route (gtfs, from, to, start_time),
replications = 10) [, 1:5]
#> test replications elapsed relative user.self
#> 2 route(gtfs, from, to, start_time) 10 9.716 1.802 12.218
#> 1 traveltimes(gtfs, from, start_time) 10 5.392 1.000 5.767
Created on 2020-12-14 by the reprex package (v0.3.0)
New traveltimes algorithm traces routes to all points within entire system nearly twice as fast as gtfs_routes()
for single connection to one point. Once we're happy with gtfs_traveltimes()
, i'll start re-integrating this algorithm back into other package functions.
Nice, this is looking great!
The option without minimise_transfers
actually seems to get closer to the benchmark of OTP
difference to OTP transfers with minimise_transfers
difference to OTP transfers WITHOUT minimise_transfers
This looks in general really good now - Looking into detail of the deviations there are a few reasons for them.
for some cases I actually could not reproduce the benchmark value and the gtfsrouter seemed to be right 🤷♀️
for the cases I checked with too many transfers, this is due to missing "tranfers" which were also further than 200 meters away (and therefore not created by the default setting of gtfs_transfer_table
)
A little bug still seems to exists though:
Some of the cases with too few transfers are actually an impossible amount of transfers, because at those stops does not exist a line that runs from Hbf directly. This is even more true if the paremeter minimise_transfers
is set to True. Here are some examples:
library(gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.149'
gtfs <- extract_gtfs(file.path("vbb_202010.zip"))
#> > Unzipping GTFS archive
#> v Unzipped GTFS archive
#> > Extracting GTFS feedv Extracted GTFS feed
#> > Converting stop times to secondsv Converted stop times to seconds
#> > Converting transfer times to secondsv Converted transfer times to seconds
ttable <- gtfs_timetable(gtfs, day = "tuesday")
iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600)
# too few transfers
# is 0 - should be at least 2 tranfers
iso[iso$stop_name == "Berlin, Elsenstr.", ]
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 27916 00:40:30 0 070101002547 Berlin, Elsenstr. 13.4489 52.48503
#> 28364 00:42:30 1 070101003074 Berlin, Elsenstr. 13.4489 52.48503
# is 0 - should be at least 1 tranfer
iso[iso$stop_name == "Berlin, Ohlauer Str.", ]
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 28483 00:30:30 1 070101003215 Berlin, Ohlauer Str. 13.43048 52.49557
#> 28487 00:30:30 0 070101003219 Berlin, Ohlauer Str. 13.43048 52.49557
####### impossible transfers with minimise transfers
iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, minimise_transfers = T)
## is 0 transfers - needs at least one
iso[iso$stop_name == "Berlin, Stedingerweg", ]
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 29836 00:33:00 0 070101005414 Berlin, Stedingerweg 13.45256 52.53924
#> 29869 00:51:00 0 070101005447 Berlin, Stedingerweg 13.45256 52.53924
## is 0 transfers - needs at least two
iso[iso$stop_name == "Berlin, Gersdorfstr./Kaiserstr.", ]
#> duration ntransfers stop_id stop_name stop_lon
#> 26400 00:53:00 0 070101000831 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 26405 00:42:48 2 070101000836 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29146 00:52:00 0 070101004034 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29147 00:49:48 2 070101004035 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> stop_lat
#> 26400 52.44504
#> 26405 52.44504
#> 29146 52.44504
#> 29147 52.44504
## is 0 transfers - needs at least two
iso[iso$stop_name == "Berlin, Haynauer Str.", ]
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 26669 01:00:00 0 070101001135 Berlin, Haynauer Str. 13.36558 52.43517
#> 28324 01:01:48 1 070101003022 Berlin, Haynauer Str. 13.36558 52.43517
Created on 2020-12-14 by the reprex package (v0.3.0)
Travel time also looks good now -it shifted from being slower on average to being faster:
travel time differences of travel time rasters created with gtfsrouter and OTP
Though with the OTP benchmark here, I already once realized, that it's sometimes unrealistically slow. So I also compared it to a Google Maps benchmark, yielding much better results:
travel time differences of travel time rasters created with gtfsrouter and Google Maps
Here again a few reasons for the deviatons:
max walking distance
parameter - so if there are routes that are faster when walking in the beginning / middle / end this is faster than the gtfsrouter
library(gtfsrouter)
gtfs <- extract_gtfs(file.path("vbb_202010.zip"))
ttable <- gtfs_timetable(gtfs, day = "tuesday")
iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600)
iso[iso$stop_name == "S+U Gesundbrunnen Bhf (Berlin)", ]
iso[iso$stop_name == "S Prenzlauer Allee (Berlin)", ]
iso[iso$stop_name == "Zepernick, Inntaler Str.", ]
iso[iso$stop_name == "U Alt-Mariendorf (Berlin)", ]
iso[iso$stop_name == "Berlin, Hundsteinweg", ]
<sup>Created on 2020-12-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
I'd like to join in with my two cents, hope I don't clutter the discussion. I ran some comparison to tidytransit's travel_times
and gtfs_traveltimes
produces generally longer travel times (code gist):
I tried to look into it using my test feed that I like to throw around here. I fixed it after our last discussion but there's a new error. Maybe you're aware of this, otherwise I'll open a new issue.
gtfs_file = "routing.zip"
download.file("https://github.com/polettif/gtfs-test-feeds/raw/master/zip/routing.zip", gtfs_file)
library (gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.150'
gtfs <- gtfs_timetable(extract_gtfs(gtfs_file, T), date = 20181001)
gtfs_traveltimes(gtfs, "One", 7*3600)
#> Error in rcpp_traveltimes(gtfs_cp$timetable, gtfs_cp$transfers, nrow(gtfs_cp$stop_ids), : unordered_map::at: key not found
Thanks @polettif for both aspects there - I'll look further into comparison in travel times with tidytransit
. As for the 2nd issue withthe error, could you please open another issue on that one? I can definitely reproduce, so something ain't right there.
@polettif Your histogram of travel time differences seems to reflect differences in transfers:
So gtfsrouter
connects trips with fewer transfers than tidytransit
, even though these sometimes take longer. The raptor
algorithm always selects fastest trips, even where these require greater numbers of transfers, whereas the current gtfsrouter
implementation is ... complicated. One key difference I see is that the raptor
algorithm considers that trips can be optimised by aggregating times and transfers at each stop, and replacing optimal values at that stop whenever trips arrive which are faster or have fewer transfers. Once that has been decided, those values are propagated forward to all stations further along the same trip. So each station ends up with an optimal route as decided by optimality at each prior station.
It is nevertheless possible to have optimal routes which are formed by taking sub-optimal journeys to intermediate stations. Think two equivalent A(t=0) -> B(t=10) and A(t=5) -> B(t=16). The second trip to B will not be recorded by raptor
, because it's not the fastest way to get to B. But then if there is a separate service B(t=20) -> C(t=30), then the optimal trip from A -> C will be A(t=5) -> B(t=16) -> B(t=20) -> C(t=30), requiring the sub-optimal trip from A to B which raptor
can not record. That is why the current gtfsrouter
can record trips with fewer transfers, because each value is optimised over all preceding routes, both optimal and sub-optimal. The raptor
paper glibly claims "Pareto optimal" in the abstract, and refers to it the Intro, yet in no ways shows Pareto optimality, so editing ought at least have forced them to drop that term. I think what they likely could show is that the times to any station C reflect the ability to get there given that the optimal trip has been taken to all preceding stations (B). So the raptor
optimal time to C has to be considered given that the journey to B was also raptor
optimal. Yet this ain't necessarily the best way to get to C... And that's why the algorithm here has been completely redesigned, to attempt to derive optimal trips to all stations, including trips requiring intermediate journeys which are sub-optimal.
The fact that the gtfsrouter
algorithm can't produce travel times as fast as raptor
values is nevertheless problematic, and shows that this issue is not over yet... Thanks for the stimulating input.
Ah I see, interesting. I haven't really worked with optimization for least amount of transfers, so I'll look into that. I haven't done this for raptor because with minimising transfers vs. minimising travel times you're getting into more of a "personal choice" optimization so I decided to stick to times.
One key difference I see is that the raptor algorithm considers that trips can be optimised by aggregating times and transfers at each stop, and replacing optimal values at that stop whenever trips arrive which are faster or have fewer transfers. Once that has been decided, those values are propagated forward to all stations further along the same trip. So each station ends up with an optimal route as decided by optimality at each prior station.
Yes but that's only the base algorithm and not exactly how raptor is implemented in tidytransit. raptor
uses something like "Range Queries" described in chapter 4.2 of the paper. I probably should mention this explicitly in the documentation.
Basically there's a "raptor calculation" for each departure from an initial stop, so t=0 and t=5 for A. raptor
returns all journeys departing within time_range
from a given stop to all other stops with each journey being the shortest possible (i.e. having the earliest arrival time). By default, raptor
returns all such journeys for further analysis but it's also possible to filter for shortest travel time, earliest or latest arrival with the keep
parameter. This step is probably where optimization for least transfers could be implemented.
Your example works as expected in raptor
:
stop_times = as.data.frame(tibble::tribble(
~trip_id, ~stop_id, ~arrival_time, ~departure_time, ~stop_sequence,
"tripAB1", "A", "00:00:00", "00:00:00", 1,
"tripAB1", "B", "00:00:10", "00:00:10", 2,
"tripAB2", "A", "00:00:05", "00:00:05", 1,
"tripAB2", "B", "00:00:16", "00:00:16", 2,
"tripBC1", "B", "00:00:20", "00:00:20", 1,
"tripBC1", "C", "00:00:30", "00:00:30", 2
))
tidytransit::raptor(stop_times,
transfers = data.frame(),
stop_ids = "A")
#> from_stop_id to_stop_id travel_time journey_departure_time journey_arrival_time transfers
#> 1: A A 0 0 0 0
#> 2: A B 10 0 10 0
#> 3: A B 11 5 16 0
#> 4: A C 25 5 30 1
Created on 2020-12-16 by the reprex package (v0.3.0)
The journey A(t=0)->C(t=30) is not returned because its arrival time is the same as A(t=5)->C(t=30) while departing later.
@AlexandraKapp check it out now, first repeating the full code from above to ensure complete reproducibility:
library (lubridate)
library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.157'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
transfers <- gtfsrouter::gtfs_transfer_table (gtfs, network_times = FALSE)
transfers <- gtfs$transfers %>%
select(.data$from_stop_id, .data$to_stop_id, .data$transfer_type, .data$min_transfer_time) %>%
rbind(transfers) %>%
group_by(.data$from_stop_id, .data$to_stop_id) %>%
summarise(across(everything(), first)) %>% # take min_transfer_time from gtfs$transfers if present
data.table::data.table()
gtfs$transfers <- transfers
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
# Then get the tidytransit raptor traveltimes as additional benchmark:
library (dplyr)
library (tidytransit)
gtfs_1 = read_gtfs("vbb.zip")
from <- "Berlin Hauptbahnhof"
from_name = gtfs_1$stops %>%
filter(grepl(from, stop_name)) %>%
pull(stop_name) %>% unique()
start_time <- 8 * 3600
timetable_1 = filter_stop_times(gtfs_1, "2020-12-08", start_time, 24*3600)
tts_1 <- travel_times(timetable_1, from_name)
from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
iso <- gtfs_traveltimes (gtfs, from, start_time)
route_stats <- function (gtfs, from, to, start_time) {
r <- gtfs_route (gtfs, from, to, start_time)
tstart <- hms (r$departure_time [1])
tend <- hms (tail (r$arrival_time, 1))
dur_sec <- as.duration (tend - tstart) / dseconds (1)
dur <- seconds_to_period (dur_sec)
ntr <- length (unique (r$trip_name)) - 1
dur_fmt <- sprintf ('%02d:%02d:%02d', dur@hour, minute (dur), second (dur))
message (ntr, " transfers; trip duration = ", dur_fmt)
return (c (dur_sec, ntr))
}
to <- c ("Berlin, Elsenstr.",
"Berlin, Ohlauer Str.",
"Berlin, Stedingerweg",
"Berlin, Gersdorfstr./Kaiserstr.",
"Berlin, Haynauer Str.",
"Gesundbrunnen Bhf",
"S Prenzlauer Allee",
"Zepernick, Inntaler Str.",
"U Alt-Mariendorf",
"Berlin, Hundsteinweg")
compare_iso2route <- function (gtfs, iso, tts_1, from, to, start_time) {
message ("----TO: ", to, "----")
this_iso <- iso [grep (to, iso$stop_name), ]
print (this_iso)
this_route <- route_stats (gtfs, from, to, start_time)
# extract trip fastest trip with equal minimal transfers:
this_iso <- this_iso [which.min (this_iso$ntransfers), ]
this_iso <- this_iso [which.min (this_iso$duration), ]
iso_time <- as.integer (seconds (this_iso$duration))
if (iso_time <= this_route [1] & this_iso$ntransfers <= this_route [2])
message (" ---> Okay, traveltimes at least as good as gtfs_route")
else
message (" ---> Problem: gtfs_route is better than traveltimes")
rptr <- tts_1 [grep (to, tts_1$to_stop_name), ]
rptr <- rptr [which.min (rptr$transfers), ]
rptr <- rptr [which.min (rptr$travel_time), ]
message ("Raptor: ", hms::hms (rptr$travel_time),
" with ", rptr$transfers, " transfers")
if ((iso_time <= rptr$travel_time & this_iso$ntransfers <= rptr$transfers) |
this_iso$ntransfers < rptr$transfers)
message (" ---> Okay, traveltimes at least as good as raptor")
else
message (" ---> Problem: raptor is better than traveltimes")
message ()
}
Then the results:
for (i in to)
compare_iso2route (gtfs, iso, tts_1, from, i, start_time)
#> ----TO: Berlin, Elsenstr.----
#> duration ntransfers stop_id stop_name
#> 1457 00:34:18 3 900000076191 Berlin, Elsenstr.
#> 2748 00:18:38 1 900000190100 Berlin, Elsenstr./S Treptower Park
#> 2753 00:24:18 2 900000190505 Berlin, Elsenstr./Kiefholzstr.
#> 27916 00:32:18 2 070101002547 Berlin, Elsenstr.
#> 28364 00:34:18 3 070101003074 Berlin, Elsenstr.
#> 30137 00:24:18 2 070101005735 Berlin, Elsenstr./Kiefholzstr.
#> 30144 00:18:38 1 070101005742 Berlin, Elsenstr./S Treptower Park
#> 30145 00:22:18 1 070101005743 Berlin, Elsenstr./Kiefholzstr.
#> 30421 00:18:38 1 070101006031 Berlin, Elsenstr./S Treptower Park
#> 40231 00:18:38 1 060000190100 Berlin, Elsenstr./S Treptower Park
#> stop_lon stop_lat
#> 1457 13.44890 52.48503
#> 2748 13.45881 52.49289
#> 2753 13.45355 52.48855
#> 27916 13.44890 52.48503
#> 28364 13.44890 52.48503
#> 30137 13.45355 52.48855
#> 30144 13.45881 52.49289
#> 30145 13.45355 52.48855
#> 30421 13.45881 52.49289
#> 40231 13.45881 52.49289
#> Note: method with signature 'Period#ANY' chosen for function '-',
#> target signature 'Period#Period'.
#> "ANY#Period" would also be valid
#> 1 transfers; trip duration = 00:18:38
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:24:00 with 1 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Berlin, Ohlauer Str.----
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 442 00:29:18 3 900000015104 Berlin, Ohlauer Str. 13.43048 52.49557
#> 28483 00:27:18 2 070101003215 Berlin, Ohlauer Str. 13.43048 52.49557
#> 28487 00:29:18 3 070101003219 Berlin, Ohlauer Str. 13.43048 52.49557
#> 1 transfers; trip duration = 00:30:30
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:29:48 with 2 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Berlin, Stedingerweg----
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 1968 00:24:18 2 900000110032 Berlin, Stedingerweg 13.45256 52.53924
#> 29836 00:22:18 1 070101005414 Berlin, Stedingerweg 13.45256 52.53924
#> 29869 00:24:18 2 070101005447 Berlin, Stedingerweg 13.45256 52.53924
#> 1 transfers; trip duration = 00:22:18
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:27:00 with 1 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Berlin, Gersdorfstr./Kaiserstr.----
#> duration ntransfers stop_id stop_name stop_lon
#> 1333 00:33:18 3 900000070201 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 26400 00:32:18 2 070101000831 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 26405 00:33:18 3 070101000836 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29146 00:31:18 2 070101004034 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29147 00:32:00 2 070101004035 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> stop_lat
#> 1333 52.44504
#> 26400 52.44504
#> 26405 52.44504
#> 29146 52.44504
#> 29147 52.44504
#> 2 transfers; trip duration = 00:33:00
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:28:00 with 2 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Berlin, Haynauer Str.----
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 1359 00:33:48 4 900000071102 Berlin, Haynauer Str. 13.36558 52.43517
#> 26669 00:33:48 4 070101001135 Berlin, Haynauer Str. 13.36558 52.43517
#> 28324 00:31:48 3 070101003022 Berlin, Haynauer Str. 13.36558 52.43517
#> 2 transfers; trip duration = 00:36:00
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:36:00 with 2 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Gesundbrunnen Bhf----
#> duration ntransfers stop_id stop_name stop_lon
#> 348 00:14:42 2 900000007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13721 00:14:42 2 060007102721 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13722 00:14:42 2 060007102722 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13723 00:14:42 2 060007102723 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13724 00:12:42 1 060007102724 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 19367 00:17:42 2 000008011102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 21314 00:17:42 2 450009007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 23140 00:17:42 2 710009990056 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 24494 00:17:42 2 710009007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 27900 00:16:42 2 070101002524 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 27924 00:16:42 2 070101002563 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 27929 00:16:42 2 070101002568 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 32171 00:15:42 2 070201083101 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 32172 00:15:42 2 070201083102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> stop_lat
#> 348 52.54864
#> 13721 52.54864
#> 13722 52.54864
#> 13723 52.54864
#> 13724 52.54864
#> 19367 52.54864
#> 21314 52.54864
#> 23140 52.54864
#> 24494 52.54864
#> 27900 52.54864
#> 27924 52.54864
#> 27929 52.54864
#> 32171 52.54864
#> 32172 52.54864
#> 1 transfers; trip duration = 00:14:12
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:03:00 with 0 transfers
#> ---> Problem: raptor is better than traveltimes
#>
#> ----TO: S Prenzlauer Allee----
#> duration ntransfers stop_id stop_name stop_lon
#> 1941 00:23:00 2 900000110002 S Prenzlauer Allee (Berlin) 13.42742
#> 13818 00:25:00 2 060110002781 S Prenzlauer Allee (Berlin) 13.42742
#> 13819 00:25:00 2 060110002782 S Prenzlauer Allee (Berlin) 13.42742
#> 29841 00:24:00 2 070101005419 S Prenzlauer Allee (Berlin) 13.42742
#> 29864 00:24:00 2 070101005442 S Prenzlauer Allee (Berlin) 13.42742
#> 30014 00:24:00 2 070101005606 S Prenzlauer Allee (Berlin) 13.42742
#> 32824 00:21:00 1 070301009563 S Prenzlauer Allee (Berlin) 13.42742
#> 32843 00:23:00 2 070301009582 S Prenzlauer Allee (Berlin) 13.42742
#> stop_lat
#> 1941 52.5448
#> 13818 52.5448
#> 13819 52.5448
#> 29841 52.5448
#> 29864 52.5448
#> 30014 52.5448
#> 32824 52.5448
#> 32843 52.5448
#> 1 transfers; trip duration = 00:21:48
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:12:42 with 1 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Zepernick, Inntaler Str.----
#> duration ntransfers stop_id stop_name stop_lon
#> 7050 00:47:00 3 900000350173 Zepernick, Inntaler Str. 13.53415
#> 16233 00:48:00 2 750000328701 Zepernick, Inntaler Str. 13.53415
#> 16234 00:45:00 2 750000328702 Zepernick, Inntaler Str. 13.53415
#> stop_lat
#> 7050 52.64832
#> 16233 52.64832
#> 16234 52.64832
#> 2 transfers; trip duration = 00:45:00
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:40:00 with 2 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: U Alt-Mariendorf----
#> duration ntransfers stop_id
#> 1338 00:25:48 3 900000070301
#> 24501 00:25:48 3 900000070701
#> 24502 00:25:48 3 900000070702
#> 26668 00:29:48 4 070101001134
#> 26697 00:29:48 4 070101001164
#> 28601 00:25:48 3 070101003370
#> 28625 00:25:48 3 070101003398
#> 28762 00:25:48 3 070101003558
#> 28772 00:25:48 3 070101003568
#> 28810 00:25:48 3 070101003615
#> 29390 00:25:48 3 070101004347
#> 32071 00:26:48 3 070201064901
#> 32072 00:23:48 2 070201064902
#> 36519 00:29:48 4 900000070703
#> 40139 00:25:48 3 070101002055
#> stop_name stop_lon stop_lat
#> 1338 U Alt-Mariendorf (Berlin) 13.38746 52.43964
#> 24501 U Alt-Mariendorf (Berlin) [Bus Alt-Mariendorf] 13.38750 52.44023
#> 24502 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 26668 U Alt-Mariendorf [Busendstelle] (Berlin) 13.39113 52.43964
#> 26697 U Alt-Mariendorf [Busendstelle] (Berlin) 13.39113 52.43964
#> 28601 U Alt-Mariendorf (Berlin) [Bus Alt-Mariendorf] 13.38750 52.44023
#> 28625 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 28762 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 28772 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 28810 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 29390 U Alt-Mariendorf (Berlin) [Bus Alt-Mariendorf] 13.38750 52.44023
#> 32071 U Alt-Mariendorf (Berlin) 13.38746 52.43964
#> 32072 U Alt-Mariendorf (Berlin) 13.38746 52.43964
#> 36519 U Alt-Mariendorf [Busendstelle] (Berlin) 13.39113 52.43964
#> 40139 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 1 transfers; trip duration = 00:28:00
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:25:00 with 1 transfers
#> ---> Okay, traveltimes at least as good as raptor
#>
#> ----TO: Berlin, Hundsteinweg----
#> duration ntransfers stop_id stop_name stop_lon stop_lat
#> 1342 00:31:00 3 900000070305 Berlin, Hundsteinweg 13.3901 52.43436
#> 27763 00:29:00 2 070101002381 Berlin, Hundsteinweg 13.3901 52.43436
#> 29275 00:31:00 3 070101004193 Berlin, Hundsteinweg 13.3901 52.43436
#> 2 transfers; trip duration = 00:34:00
#> ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:36:00 with 2 transfers
#> ---> Okay, traveltimes at least as good as raptor
Created on 2021-01-26 by the reprex package (v0.3.0)
And they are all either faster, or have fewer transfers, or both. The only exception is one raptor travel time of 03:00 from Hbf to Gesundbrunnen, which is obviously ridiculous.
That all looks good to me now, but I wouldn't be surprised if you find a few quirks :smirk: The problem was this:
If one trip goes A -> B -> C then transfers to E for E -> F, then the last leg (E -> F) could be updated by a potentially slower trip
A -> B transfer to D for D -> E -> F, because the timetable line for E -> F connected with an identical trip number for previous stage D -> E. Thus E -> F did not by default record the fastest trip. The above commit ensures those replacements are only made if mimimise_transfers
is TRUE
, and if the replacement either has fewer transfers, or is faster, or both.
The
gtfs_traveltimes
implementation is great, thanks a lot! :tada: The speed of the computation is impressive 😱Wouldnt be me, if I wouldnt still find some issues ;)
Compared to my OTP benchmark I still get a few too many transfers:
checking single examples for the deviations also reveal a deviation from the
gtfs_route
function:Created on 2020-12-07 by the reprex package (v0.3.0)
Maybe you could check if those deviations are expected or if there might still be a bug?
The travel time is also slower on average than the benchmark. In the examples above there are also two examples where the travel time is slower than the result from the
gtfs_route
function - that might be the same source?