UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
81 stars 17 forks source link

differences in gtfs_traveltime and gtfs_route results #61

Closed AlexandraKapp closed 3 years ago

AlexandraKapp commented 3 years ago

The gtfs_traveltimes implementation is great, thanks a lot! :tada: The speed of the computation is impressive 😱

Wouldnt be me, if I wouldnt still find some issues ;)

Compared to my OTP benchmark I still get a few too many transfers:

# just to show the code on how I get to the result
# benchmark is a dataframe with the amount of transfers from Hbf to each stop

iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, cutoff = 0, minimise_transfers = T)

transfers_gtfsrouter <- iso %>% 
  group_by(stop_name) %>% 
  summarize(transfers_gtfsr = min(ntransfers))

comparison <- full_join(benchmark, transfers_gtfsrouter, na_matches = "never", by = "stop_name") %>% 
  mutate(diff = transfers - transfers_gtfsr) 

hist(comparison$transfers - comparison$transfers_gtfsr)

grafik

checking single examples for the deviations also reveal a deviation from the gtfs_route function:

library(gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.140'

gtfs <- extract_gtfs(file.path("C:/Users/AlexandraKapp/OneDrive - Mobility Institute Berlin/02_playground/traveltime_index/data/vbb_202010.zip"))
#> > Unzipping GTFS archivev Unzipped GTFS archive  
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
gtfs$calendar[1,]
#>    service_id monday tuesday wednesday thursday friday saturday sunday
#> 1:          1      0       0         0        0      0        0      0
#>    start_date end_date
#> 1:   20201023 20201212
ttable <- gtfs_timetable(gtfs, day = "tuesday")

iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, cutoff = 0, minimise_transfers = T)

# 3 transfers 45 min 1 transfers 36 min
to <- "Berlin, Kaiserkorso"
print(iso [grep (to, iso$stop_name), ])
#>       duration ntransfers           id           stop_name stop_lon stop_lat
#> 28655 00:45:30          3 070101003438 Berlin, Kaiserkorso 13.38429 52.48409
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>    route_name                   trip_name
#> 1         M41 Sonnenallee/Baumschulenstr.
#> 2         M41 Sonnenallee/Baumschulenstr.
#> 3         M41 Sonnenallee/Baumschulenstr.
#> 4         M41 Sonnenallee/Baumschulenstr.
#> 5         M41 Sonnenallee/Baumschulenstr.
#> 6         M41 Sonnenallee/Baumschulenstr.
#> 7         M41 Sonnenallee/Baumschulenstr.
#> 8         M41 Sonnenallee/Baumschulenstr.
#> 9         248        Berlin, Reichartstr.
#> 10        248        Berlin, Reichartstr.
#> 11        248        Berlin, Reichartstr.
#> 12        248        Berlin, Reichartstr.
#> 13        248        Berlin, Reichartstr.
#> 14        248        Berlin, Reichartstr.
#> 15        248        Berlin, Reichartstr.
#> 16        248        Berlin, Reichartstr.
#> 17        248          S+U Alexanderplatz
#> 18        248          S+U Alexanderplatz
#>                                         stop_name arrival_time departure_time
#> 1                         S+U Berlin Hauptbahnhof     08:00:30       08:00:30
#> 2         S Potsdamer Platz Bhf/Voßstr. (Berlin)     08:06:00       08:06:00
#> 3  S+U Potsdamer Platz (Bln) [Bus Stresemannstr.]     08:07:00       08:07:00
#> 4                        Berlin, Abgeordnetenhaus     08:08:00       08:08:00
#> 5                     S Anhalter Bahnhof (Berlin)     08:10:00       08:10:00
#> 6                       Berlin, Willy-Brandt-Haus     08:11:00       08:11:00
#> 7                       U Hallesches Tor (Berlin)     08:15:00       08:15:00
#> 8                            Berlin, Blücherstr.     08:16:00       08:16:00
#> 9                            Berlin, Blücherstr.     08:18:00       08:18:00
#> 10                       U Gneisenaustr. (Berlin)     08:20:00       08:20:00
#> 11                        Berlin, Marheinekeplatz     08:22:00       08:22:00
#> 12                       Berlin, Jüterboger Str.     08:24:00       08:24:00
#> 13               Berlin, Columbiadamm/Friesenstr.     08:25:00       08:25:00
#> 14     Berlin, Columbiadamm/Platz der Luftbrücke     08:26:00       08:26:00
#> 15               U Platz der Luftbrücke (Berlin)     08:28:00       08:28:00
#> 16                             Berlin, Bayernring     08:29:00       08:29:00
#> 17                             Berlin, Bayernring     08:35:00       08:35:00
#> 18                            Berlin, Kaiserkorso     08:36:00       08:36:00

# 2 transfers 37 min vs 1 transfer 23 min
to <- "S Wilhelmsruh"
print(iso [grep (to, iso$stop_name), ])
#>       duration ntransfers           id              stop_name stop_lon stop_lat
#> 16481 00:37:12          2 060084101101 S Wilhelmsruh (Berlin) 13.36328 52.58159
#> 16482 00:54:42          3 060084101102 S Wilhelmsruh (Berlin) 13.36328 52.58159
#> 27642 02:00:00          3 070101002231 S Wilhelmsruh (Berlin) 13.36328 52.58159
#> 29373 02:02:00          3 070101004328 S Wilhelmsruh (Berlin) 13.36328 52.58159
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>    route_name                  trip_name                      stop_name
#> 1          S9 Flughafen BER Terminal 1-2        S+U Berlin Hauptbahnhof
#> 2          S9 Flughafen BER Terminal 1-2 S+U Friedrichstr. Bhf (Berlin)
#> 3          S1          S Oranienburg Bhf S+U Friedrichstr. Bhf (Berlin)
#> 4          S1          S Oranienburg Bhf  S Oranienburger Str. (Berlin)
#> 5          S1          S Oranienburg Bhf         S Nordbahnhof (Berlin)
#> 6          S1          S Oranienburg Bhf        S Humboldthain (Berlin)
#> 7          S1          S Oranienburg Bhf S+U Gesundbrunnen Bhf (Berlin)
#> 8          S1          S Oranienburg Bhf     S Bornholmer Str. (Berlin)
#> 9          S1          S Oranienburg Bhf         S Wollankstr. (Berlin)
#> 10         S1          S Oranienburg Bhf          S Schönholz (Berlin)
#> 11         S1          S Oranienburg Bhf         S Wilhelmsruh (Berlin)
#>    arrival_time departure_time
#> 1      08:05:00       08:05:42
#> 2      08:07:36       08:08:24
#> 3      08:10:06       08:10:54
#> 4      08:12:12       08:12:42
#> 5      08:14:12       08:14:42
#> 6      08:16:54       08:17:24
#> 7      08:18:42       08:19:24
#> 8      08:20:54       08:21:24
#> 9      08:23:00       08:23:30
#> 10     08:24:48       08:25:36
#> 11     08:27:42       08:28:12

# 4 transfers 34 min vs 2 transfer 34 min
to <- "Berlin, Hegemeisterweg"
print(iso [grep (to, iso$stop_name), ])
#>       duration ntransfers           id              stop_name stop_lon stop_lat
#> 31783 00:35:48          5 070101050371 Berlin, Hegemeisterweg 13.52022 52.47401
#> 31784 00:33:48          4 070101050372 Berlin, Hegemeisterweg 13.52022 52.47401
print(gtfs_route(ttable, "Berlin Hauptbahnhof", to, start_time = 8*3600))
#>    route_name                trip_name                       stop_name
#> 1          S5        S Strausberg Nord         S+U Berlin Hauptbahnhof
#> 2          S5        S Strausberg Nord  S+U Friedrichstr. Bhf (Berlin)
#> 3          S5        S Strausberg Nord     S Hackescher Markt (Berlin)
#> 4          S5        S Strausberg Nord S+U Alexanderplatz Bhf (Berlin)
#> 5          S5        S Strausberg Nord   S+U Jannowitzbrücke (Berlin)
#> 6          S5        S Strausberg Nord           S Ostbahnhof (Berlin)
#> 7          S3        S Friedrichshagen    S+U Warschauer Str. (Berlin)
#> 8          S3        S Friedrichshagen         S Ostkreuz Bhf (Berlin)
#> 9          S3        S Friedrichshagen           S Karlshorst (Berlin)
#> 10        M17 Berlin, Hasselwerderstr.           S Karlshorst (Berlin)
#> 11        M17 Berlin, Hasselwerderstr.          Berlin, Hegemeisterweg
#>    arrival_time departure_time
#> 1      08:00:30       08:01:12
#> 2      08:03:06       08:03:54
#> 3      08:05:24       08:05:54
#> 4      08:07:06       08:07:54
#> 5      08:09:24       08:09:54
#> 6      08:11:36       08:12:24
#> 7      08:16:54       08:17:24
#> 8      08:19:06       08:19:42
#> 9      08:24:36       08:25:12
#> 10     08:31:00       08:31:00
#> 11     08:35:00       08:35:00

Created on 2020-12-07 by the reprex package (v0.3.0)

Maybe you could check if those deviations are expected or if there might still be a bug?

The travel time is also slower on average than the benchmark. In the examples above there are also two examples where the travel time is slower than the result from the gtfs_route function - that might be the same source?

grafik

grafik

mpadge commented 3 years ago

My hope is now that the above commit has fixed, but I'll re-open until confirmed by @AlexandraKapp. Of your examples above, the sole remaining mismatch is this:

library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.149'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$calendar [1, -(1:8)]
#>    start_date end_date
#> 1:   20201023 20201212
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600

iso <- gtfs_traveltimes (gtfs, from = from, start_time = start_time)

to <- "Berlin, Hegemeisterweg"
iso [grep (to, iso$stop_name), ]
#>       duration ntransfers      stop_id              stop_name stop_lon stop_lat
#> 31783 00:35:48          2 070101050371 Berlin, Hegemeisterweg 13.52022 52.47401
#> 31784 00:33:48          1 070101050372 Berlin, Hegemeisterweg 13.52022 52.47401

gtfs_route(gtfs, "Berlin Hauptbahnhof", to, start_time = 8*3600)
#>    route_name                trip_name                       stop_name
#> 1          S5        S Strausberg Nord         S+U Berlin Hauptbahnhof
#> 2          S5        S Strausberg Nord  S+U Friedrichstr. Bhf (Berlin)
#> 3          S5        S Strausberg Nord     S Hackescher Markt (Berlin)
#> 4          S5        S Strausberg Nord S+U Alexanderplatz Bhf (Berlin)
#> 5          S5        S Strausberg Nord    S+U Jannowitzbrücke (Berlin)
#> 6          S5        S Strausberg Nord           S Ostbahnhof (Berlin)
#> 7          S3        S Friedrichshagen    S+U Warschauer Str. (Berlin)
#> 8          S3        S Friedrichshagen         S Ostkreuz Bhf (Berlin)
#> 9          S3        S Friedrichshagen           S Karlshorst (Berlin)
#> 10        M17 Berlin, Hasselwerderstr.           S Karlshorst (Berlin)
#> 11        M17 Berlin, Hasselwerderstr.          Berlin, Hegemeisterweg
#>    arrival_time departure_time
#> 1      08:00:30       08:01:12
#> 2      08:03:06       08:03:54
#> 3      08:05:24       08:05:54
#> 4      08:07:06       08:07:54
#> 5      08:09:24       08:09:54
#> 6      08:11:36       08:12:24
#> 7      08:16:54       08:17:24
#> 8      08:19:06       08:19:42
#> 9      08:24:36       08:25:12
#> 10     08:31:00       08:31:00
#> 11     08:35:00       08:35:00

Created on 2020-12-14 by the reprex package (v0.3.0)

And that looks more like the standard gtfs_route() function not following the minimal-transfer route, while gtfs_isochrone() does. The two algorithms now work completely differently, so that's perhaps not surprising, and I'd be happy to interpret that as a bug in gtfs_route(), rather than gtfs_isochrone(). @AlexandraKapp Would you please run this through your benchmark tests and see what you can find? Thanks!

mpadge commented 3 years ago

Clearly have to revise the main gtfs_route() function anyway, because ...

library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.149'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs$calendar [1, -(1:8)]
#>    start_date end_date
#> 1:   20201023 20201212
gtfs <- gtfs_timetable(gtfs, day = "tuesday")
from <- "Berlin Hauptbahnhof"
to <- "Berlin, Hegemeisterweg"
start_time <- 8 * 3600

traveltimes <- function (gtfs, from, start_time)
    gtfs_traveltimes (gtfs, from = from, start_time = start_time)
route <- function (gtfs, from, to, start_time)
    gtfs_route(gtfs, from = from, to = to, start_time = start_time)

rbenchmark::benchmark (
                       traveltimes (gtfs, from, start_time),
                       route (gtfs, from, to, start_time),
                       replications = 10) [, 1:5]
#>                                  test replications elapsed relative user.self
#> 2   route(gtfs, from, to, start_time)           10   9.716    1.802    12.218
#> 1 traveltimes(gtfs, from, start_time)           10   5.392    1.000     5.767

Created on 2020-12-14 by the reprex package (v0.3.0)

New traveltimes algorithm traces routes to all points within entire system nearly twice as fast as gtfs_routes() for single connection to one point. Once we're happy with gtfs_traveltimes(), i'll start re-integrating this algorithm back into other package functions.

AlexandraKapp commented 3 years ago

Nice, this is looking great!

The option without minimise_transfers actually seems to get closer to the benchmark of OTP

difference to OTP transfers with minimise_transfers

grafik grafik

difference to OTP transfers WITHOUT minimise_transfers

grafik grafik

This looks in general really good now - Looking into detail of the deviations there are a few reasons for them.

library(gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.149'

gtfs <- extract_gtfs(file.path("vbb_202010.zip"))
#> > Unzipping GTFS archive
#> v Unzipped GTFS archive
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
ttable <- gtfs_timetable(gtfs, day = "tuesday")

iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600)

# too few transfers

# is 0 - should be at least 2 tranfers
iso[iso$stop_name == "Berlin, Elsenstr.", ]
#>       duration ntransfers      stop_id         stop_name stop_lon stop_lat
#> 27916 00:40:30          0 070101002547 Berlin, Elsenstr.  13.4489 52.48503
#> 28364 00:42:30          1 070101003074 Berlin, Elsenstr.  13.4489 52.48503

# is 0 - should be at least 1 tranfer
iso[iso$stop_name == "Berlin, Ohlauer Str.", ]
#>       duration ntransfers      stop_id            stop_name stop_lon stop_lat
#> 28483 00:30:30          1 070101003215 Berlin, Ohlauer Str. 13.43048 52.49557
#> 28487 00:30:30          0 070101003219 Berlin, Ohlauer Str. 13.43048 52.49557

####### impossible transfers with minimise transfers
iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, minimise_transfers = T)

## is 0 transfers - needs at least one
iso[iso$stop_name == "Berlin, Stedingerweg", ]
#>       duration ntransfers      stop_id            stop_name stop_lon stop_lat
#> 29836 00:33:00          0 070101005414 Berlin, Stedingerweg 13.45256 52.53924
#> 29869 00:51:00          0 070101005447 Berlin, Stedingerweg 13.45256 52.53924

## is 0 transfers - needs at least two
iso[iso$stop_name == "Berlin, Gersdorfstr./Kaiserstr.", ]
#>       duration ntransfers      stop_id                       stop_name stop_lon
#> 26400 00:53:00          0 070101000831 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 26405 00:42:48          2 070101000836 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29146 00:52:00          0 070101004034 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29147 00:49:48          2 070101004035 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#>       stop_lat
#> 26400 52.44504
#> 26405 52.44504
#> 29146 52.44504
#> 29147 52.44504

## is 0 transfers - needs at least two
iso[iso$stop_name == "Berlin, Haynauer Str.", ]
#>       duration ntransfers      stop_id             stop_name stop_lon stop_lat
#> 26669 01:00:00          0 070101001135 Berlin, Haynauer Str. 13.36558 52.43517
#> 28324 01:01:48          1 070101003022 Berlin, Haynauer Str. 13.36558 52.43517

Created on 2020-12-14 by the reprex package (v0.3.0)

AlexandraKapp commented 3 years ago

Travel time also looks good now -it shifted from being slower on average to being faster:

travel time differences of travel time rasters created with gtfsrouter and OTP grafik

Though with the OTP benchmark here, I already once realized, that it's sometimes unrealistically slow. So I also compared it to a Google Maps benchmark, yielding much better results:

travel time differences of travel time rasters created with gtfsrouter and Google Maps grafik

Here again a few reasons for the deviatons:

gtfs <- extract_gtfs(file.path("vbb_202010.zip"))

> > Unzipping GTFS archivev Unzipped GTFS archive

> > Extracting GTFS feedv Extracted GTFS feed

> > Converting stop times to secondsv Converted stop times to seconds

> > Converting transfer times to secondsv Converted transfer times to seconds

ttable <- gtfs_timetable(gtfs, day = "tuesday")

iso <- gtfs_traveltimes(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600)

too long travel time

should be 5 min by Regio with no transfer

iso[iso$stop_name == "S+U Gesundbrunnen Bhf (Berlin)", ]

> duration ntransfers stop_id stop_name stop_lon

> 13721 00:17:00 2 060007102721 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 13722 00:17:00 2 060007102722 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 13723 00:17:00 2 060007102723 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 13724 00:15:00 1 060007102724 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 19367 00:20:00 2 000008011102 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 21314 00:20:00 2 450009007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 23140 00:20:00 2 710009990056 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 24494 00:20:00 2 710009007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 27900 00:19:00 2 070101002524 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 27924 00:19:00 2 070101002563 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 27929 00:19:00 2 070101002568 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 32171 00:18:00 2 070201083101 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> 32172 00:18:00 1 070201083102 S+U Gesundbrunnen Bhf (Berlin) 13.38837

> stop_lat

> 13721 52.54864

> 13722 52.54864

> 13723 52.54864

> 13724 52.54864

> 19367 52.54864

> 21314 52.54864

> 23140 52.54864

> 24494 52.54864

> 27900 52.54864

> 27924 52.54864

> 27929 52.54864

> 32171 52.54864

> 32172 52.54864

should be 18 min

iso[iso$stop_name == "S Prenzlauer Allee (Berlin)", ]

> duration ntransfers stop_id stop_name stop_lon

> 13818 00:26:00 2 060110002781 S Prenzlauer Allee (Berlin) 13.42742

> 13819 00:26:00 3 060110002782 S Prenzlauer Allee (Berlin) 13.42742

> 29841 00:29:00 2 070101005419 S Prenzlauer Allee (Berlin) 13.42742

> 29864 00:29:00 2 070101005442 S Prenzlauer Allee (Berlin) 13.42742

> 30014 00:29:00 2 070101005606 S Prenzlauer Allee (Berlin) 13.42742

> 32824 00:26:00 1 070301009563 S Prenzlauer Allee (Berlin) 13.42742

> 32843 00:28:00 2 070301009582 S Prenzlauer Allee (Berlin) 13.42742

> stop_lat

> 13818 52.5448

> 13819 52.5448

> 29841 52.5448

> 29864 52.5448

> 30014 52.5448

> 32824 52.5448

> 32843 52.5448

should be 45 min

iso[iso$stop_name == "Zepernick, Inntaler Str.", ]

> duration ntransfers stop_id stop_name stop_lon

> 16233 00:53:48 2 750000328701 Zepernick, Inntaler Str. 13.53415

> 16234 00:55:48 2 750000328702 Zepernick, Inntaler Str. 13.53415

> stop_lat

> 16233 52.64832

> 16234 52.64832

should be 18 min

iso[iso$stop_name == "U Alt-Mariendorf (Berlin)", ]

> duration ntransfers stop_id stop_name stop_lon

> 32071 00:31:48 2 070201064901 U Alt-Mariendorf (Berlin) 13.38746

> 32072 00:28:48 1 070201064902 U Alt-Mariendorf (Berlin) 13.38746

> stop_lat

> 32071 52.43964

> 32072 52.43964

should be 31 minutes

iso[iso$stop_name == "Berlin, Hundsteinweg", ]

> duration ntransfers stop_id stop_name stop_lon stop_lat

> 27763 00:44:48 3 070101002381 Berlin, Hundsteinweg 13.3901 52.43436

> 29275 00:46:48 4 070101004193 Berlin, Hundsteinweg 13.3901 52.43436



<sup>Created on 2020-12-14 by the [reprex package](https://reprex.tidyverse.org) (v0.3.0)</sup>
polettif commented 3 years ago

I'd like to join in with my two cents, hope I don't clutter the discussion. I ran some comparison to tidytransit's travel_times and gtfs_traveltimes produces generally longer travel times (code gist):

I tried to look into it using my test feed that I like to throw around here. I fixed it after our last discussion but there's a new error. Maybe you're aware of this, otherwise I'll open a new issue.

gtfs_file = "routing.zip"
download.file("https://github.com/polettif/gtfs-test-feeds/raw/master/zip/routing.zip", gtfs_file)

library (gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.150'
gtfs <- gtfs_timetable(extract_gtfs(gtfs_file, T), date = 20181001)

gtfs_traveltimes(gtfs, "One", 7*3600)
#> Error in rcpp_traveltimes(gtfs_cp$timetable, gtfs_cp$transfers, nrow(gtfs_cp$stop_ids), : unordered_map::at: key not found
mpadge commented 3 years ago

Thanks @polettif for both aspects there - I'll look further into comparison in travel times with tidytransit. As for the 2nd issue withthe error, could you please open another issue on that one? I can definitely reproduce, so something ain't right there.

mpadge commented 3 years ago

@polettif Your histogram of travel time differences seems to reflect differences in transfers:

junk

So gtfsrouter connects trips with fewer transfers than tidytransit, even though these sometimes take longer. The raptor algorithm always selects fastest trips, even where these require greater numbers of transfers, whereas the current gtfsrouter implementation is ... complicated. One key difference I see is that the raptor algorithm considers that trips can be optimised by aggregating times and transfers at each stop, and replacing optimal values at that stop whenever trips arrive which are faster or have fewer transfers. Once that has been decided, those values are propagated forward to all stations further along the same trip. So each station ends up with an optimal route as decided by optimality at each prior station.

It is nevertheless possible to have optimal routes which are formed by taking sub-optimal journeys to intermediate stations. Think two equivalent A(t=0) -> B(t=10) and A(t=5) -> B(t=16). The second trip to B will not be recorded by raptor, because it's not the fastest way to get to B. But then if there is a separate service B(t=20) -> C(t=30), then the optimal trip from A -> C will be A(t=5) -> B(t=16) -> B(t=20) -> C(t=30), requiring the sub-optimal trip from A to B which raptor can not record. That is why the current gtfsrouter can record trips with fewer transfers, because each value is optimised over all preceding routes, both optimal and sub-optimal. The raptor paper glibly claims "Pareto optimal" in the abstract, and refers to it the Intro, yet in no ways shows Pareto optimality, so editing ought at least have forced them to drop that term. I think what they likely could show is that the times to any station C reflect the ability to get there given that the optimal trip has been taken to all preceding stations (B). So the raptor optimal time to C has to be considered given that the journey to B was also raptor optimal. Yet this ain't necessarily the best way to get to C... And that's why the algorithm here has been completely redesigned, to attempt to derive optimal trips to all stations, including trips requiring intermediate journeys which are sub-optimal.

The fact that the gtfsrouter algorithm can't produce travel times as fast as raptor values is nevertheless problematic, and shows that this issue is not over yet... Thanks for the stimulating input.

polettif commented 3 years ago

Ah I see, interesting. I haven't really worked with optimization for least amount of transfers, so I'll look into that. I haven't done this for raptor because with minimising transfers vs. minimising travel times you're getting into more of a "personal choice" optimization so I decided to stick to times.

One key difference I see is that the raptor algorithm considers that trips can be optimised by aggregating times and transfers at each stop, and replacing optimal values at that stop whenever trips arrive which are faster or have fewer transfers. Once that has been decided, those values are propagated forward to all stations further along the same trip. So each station ends up with an optimal route as decided by optimality at each prior station.

Yes but that's only the base algorithm and not exactly how raptor is implemented in tidytransit. raptor uses something like "Range Queries" described in chapter 4.2 of the paper. I probably should mention this explicitly in the documentation.

Basically there's a "raptor calculation" for each departure from an initial stop, so t=0 and t=5 for A. raptor returns all journeys departing within time_range from a given stop to all other stops with each journey being the shortest possible (i.e. having the earliest arrival time). By default, raptor returns all such journeys for further analysis but it's also possible to filter for shortest travel time, earliest or latest arrival with the keep parameter. This step is probably where optimization for least transfers could be implemented.

Your example works as expected in raptor:

stop_times = as.data.frame(tibble::tribble(
    ~trip_id, ~stop_id, ~arrival_time, ~departure_time, ~stop_sequence,
    "tripAB1", "A", "00:00:00", "00:00:00", 1,
    "tripAB1", "B", "00:00:10", "00:00:10", 2,
    "tripAB2", "A", "00:00:05", "00:00:05", 1,
    "tripAB2", "B", "00:00:16", "00:00:16", 2,
    "tripBC1", "B", "00:00:20", "00:00:20", 1,
    "tripBC1", "C", "00:00:30", "00:00:30", 2
))

tidytransit::raptor(stop_times,
       transfers = data.frame(),
       stop_ids = "A")
#>    from_stop_id to_stop_id travel_time journey_departure_time journey_arrival_time transfers
#> 1:            A          A           0                      0                    0         0
#> 2:            A          B          10                      0                   10         0
#> 3:            A          B          11                      5                   16         0
#> 4:            A          C          25                      5                   30         1

Created on 2020-12-16 by the reprex package (v0.3.0)

The journey A(t=0)->C(t=30) is not returned because its arrival time is the same as A(t=5)->C(t=30) while departing later.

mpadge commented 3 years ago

@AlexandraKapp check it out now, first repeating the full code from above to ensure complete reproducibility:

library (lubridate)
library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.4.157'
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
transfers <- gtfsrouter::gtfs_transfer_table (gtfs, network_times = FALSE)
transfers <- gtfs$transfers %>%
  select(.data$from_stop_id, .data$to_stop_id, .data$transfer_type, .data$min_transfer_time) %>%
  rbind(transfers) %>%
  group_by(.data$from_stop_id, .data$to_stop_id) %>%
  summarise(across(everything(), first)) %>% # take min_transfer_time from gtfs$transfers if present
  data.table::data.table()
gtfs$transfers <- transfers
gtfs <- gtfs_timetable(gtfs, day = "tuesday")

# Then get the tidytransit raptor traveltimes as additional benchmark:
library (dplyr)
library (tidytransit)
gtfs_1 = read_gtfs("vbb.zip")
from <- "Berlin Hauptbahnhof"
from_name = gtfs_1$stops %>%
    filter(grepl(from, stop_name)) %>%
    pull(stop_name) %>% unique()

start_time <- 8 * 3600
timetable_1 = filter_stop_times(gtfs_1, "2020-12-08", start_time, 24*3600)
tts_1 <- travel_times(timetable_1, from_name)

from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600
iso <- gtfs_traveltimes (gtfs, from, start_time)

route_stats <- function (gtfs, from, to, start_time) {
    r <- gtfs_route (gtfs, from, to, start_time)
    tstart <- hms (r$departure_time [1])
    tend <- hms (tail (r$arrival_time, 1))
    dur_sec <- as.duration (tend - tstart) / dseconds (1)
    dur <- seconds_to_period (dur_sec)
    ntr <- length (unique (r$trip_name)) - 1
    dur_fmt <- sprintf ('%02d:%02d:%02d', dur@hour, minute (dur), second (dur))
    message (ntr, " transfers; trip duration = ", dur_fmt)

    return (c (dur_sec, ntr))
}

to <- c ("Berlin, Elsenstr.",
         "Berlin, Ohlauer Str.",
         "Berlin, Stedingerweg",
         "Berlin, Gersdorfstr./Kaiserstr.",
         "Berlin, Haynauer Str.",
         "Gesundbrunnen Bhf",
         "S Prenzlauer Allee",
         "Zepernick, Inntaler Str.",
         "U Alt-Mariendorf",
         "Berlin, Hundsteinweg")

compare_iso2route <- function (gtfs, iso, tts_1, from, to, start_time) {

    message ("----TO: ", to, "----")
    this_iso <- iso [grep (to, iso$stop_name), ]
    print (this_iso)
    this_route <- route_stats (gtfs, from, to, start_time)
    # extract trip fastest trip with equal minimal transfers:
    this_iso <- this_iso [which.min (this_iso$ntransfers), ]
    this_iso <- this_iso [which.min (this_iso$duration), ]
    iso_time <- as.integer (seconds (this_iso$duration))

    if (iso_time <= this_route [1] & this_iso$ntransfers <= this_route  [2])
        message (" ---> Okay, traveltimes at least as good as gtfs_route")
    else
        message (" ---> Problem: gtfs_route is better than traveltimes")

    rptr <- tts_1 [grep (to, tts_1$to_stop_name), ]
    rptr <- rptr [which.min (rptr$transfers), ]
    rptr <- rptr [which.min (rptr$travel_time), ]
    message ("Raptor: ", hms::hms (rptr$travel_time),
             " with ", rptr$transfers, " transfers")
    if ((iso_time <= rptr$travel_time & this_iso$ntransfers <= rptr$transfers) |
        this_iso$ntransfers < rptr$transfers)
        message (" ---> Okay, traveltimes at least as good as raptor")
    else
        message (" ---> Problem: raptor is better than traveltimes")
    message ()
}

Then the results:

for (i in to)
    compare_iso2route (gtfs, iso, tts_1, from, i, start_time)

#> ----TO: Berlin, Elsenstr.----
#>       duration ntransfers      stop_id                          stop_name
#> 1457  00:34:18          3 900000076191                  Berlin, Elsenstr.
#> 2748  00:18:38          1 900000190100 Berlin, Elsenstr./S Treptower Park
#> 2753  00:24:18          2 900000190505     Berlin, Elsenstr./Kiefholzstr.
#> 27916 00:32:18          2 070101002547                  Berlin, Elsenstr.
#> 28364 00:34:18          3 070101003074                  Berlin, Elsenstr.
#> 30137 00:24:18          2 070101005735     Berlin, Elsenstr./Kiefholzstr.
#> 30144 00:18:38          1 070101005742 Berlin, Elsenstr./S Treptower Park
#> 30145 00:22:18          1 070101005743     Berlin, Elsenstr./Kiefholzstr.
#> 30421 00:18:38          1 070101006031 Berlin, Elsenstr./S Treptower Park
#> 40231 00:18:38          1 060000190100 Berlin, Elsenstr./S Treptower Park
#>       stop_lon stop_lat
#> 1457  13.44890 52.48503
#> 2748  13.45881 52.49289
#> 2753  13.45355 52.48855
#> 27916 13.44890 52.48503
#> 28364 13.44890 52.48503
#> 30137 13.45355 52.48855
#> 30144 13.45881 52.49289
#> 30145 13.45355 52.48855
#> 30421 13.45881 52.49289
#> 40231 13.45881 52.49289
#> Note: method with signature 'Period#ANY' chosen for function '-',
#>  target signature 'Period#Period'.
#>  "ANY#Period" would also be valid
#> 1 transfers; trip duration = 00:18:38
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:24:00 with 1 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Berlin, Ohlauer Str.----
#>       duration ntransfers      stop_id            stop_name stop_lon stop_lat
#> 442   00:29:18          3 900000015104 Berlin, Ohlauer Str. 13.43048 52.49557
#> 28483 00:27:18          2 070101003215 Berlin, Ohlauer Str. 13.43048 52.49557
#> 28487 00:29:18          3 070101003219 Berlin, Ohlauer Str. 13.43048 52.49557
#> 1 transfers; trip duration = 00:30:30
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:29:48 with 2 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Berlin, Stedingerweg----
#>       duration ntransfers      stop_id            stop_name stop_lon stop_lat
#> 1968  00:24:18          2 900000110032 Berlin, Stedingerweg 13.45256 52.53924
#> 29836 00:22:18          1 070101005414 Berlin, Stedingerweg 13.45256 52.53924
#> 29869 00:24:18          2 070101005447 Berlin, Stedingerweg 13.45256 52.53924
#> 1 transfers; trip duration = 00:22:18
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:27:00 with 1 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Berlin, Gersdorfstr./Kaiserstr.----
#>       duration ntransfers      stop_id                       stop_name stop_lon
#> 1333  00:33:18          3 900000070201 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 26400 00:32:18          2 070101000831 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 26405 00:33:18          3 070101000836 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29146 00:31:18          2 070101004034 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#> 29147 00:32:00          2 070101004035 Berlin, Gersdorfstr./Kaiserstr. 13.37406
#>       stop_lat
#> 1333  52.44504
#> 26400 52.44504
#> 26405 52.44504
#> 29146 52.44504
#> 29147 52.44504
#> 2 transfers; trip duration = 00:33:00
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:28:00 with 2 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Berlin, Haynauer Str.----
#>       duration ntransfers      stop_id             stop_name stop_lon stop_lat
#> 1359  00:33:48          4 900000071102 Berlin, Haynauer Str. 13.36558 52.43517
#> 26669 00:33:48          4 070101001135 Berlin, Haynauer Str. 13.36558 52.43517
#> 28324 00:31:48          3 070101003022 Berlin, Haynauer Str. 13.36558 52.43517
#> 2 transfers; trip duration = 00:36:00
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:36:00 with 2 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Gesundbrunnen Bhf----
#>       duration ntransfers      stop_id                      stop_name stop_lon
#> 348   00:14:42          2 900000007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13721 00:14:42          2 060007102721 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13722 00:14:42          2 060007102722 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13723 00:14:42          2 060007102723 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 13724 00:12:42          1 060007102724 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 19367 00:17:42          2 000008011102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 21314 00:17:42          2 450009007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 23140 00:17:42          2 710009990056 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 24494 00:17:42          2 710009007102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 27900 00:16:42          2 070101002524 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 27924 00:16:42          2 070101002563 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 27929 00:16:42          2 070101002568 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 32171 00:15:42          2 070201083101 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#> 32172 00:15:42          2 070201083102 S+U Gesundbrunnen Bhf (Berlin) 13.38837
#>       stop_lat
#> 348   52.54864
#> 13721 52.54864
#> 13722 52.54864
#> 13723 52.54864
#> 13724 52.54864
#> 19367 52.54864
#> 21314 52.54864
#> 23140 52.54864
#> 24494 52.54864
#> 27900 52.54864
#> 27924 52.54864
#> 27929 52.54864
#> 32171 52.54864
#> 32172 52.54864
#> 1 transfers; trip duration = 00:14:12
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:03:00 with 0 transfers
#>  ---> Problem: raptor is better than traveltimes
#> 
#> ----TO: S Prenzlauer Allee----
#>       duration ntransfers      stop_id                   stop_name stop_lon
#> 1941  00:23:00          2 900000110002 S Prenzlauer Allee (Berlin) 13.42742
#> 13818 00:25:00          2 060110002781 S Prenzlauer Allee (Berlin) 13.42742
#> 13819 00:25:00          2 060110002782 S Prenzlauer Allee (Berlin) 13.42742
#> 29841 00:24:00          2 070101005419 S Prenzlauer Allee (Berlin) 13.42742
#> 29864 00:24:00          2 070101005442 S Prenzlauer Allee (Berlin) 13.42742
#> 30014 00:24:00          2 070101005606 S Prenzlauer Allee (Berlin) 13.42742
#> 32824 00:21:00          1 070301009563 S Prenzlauer Allee (Berlin) 13.42742
#> 32843 00:23:00          2 070301009582 S Prenzlauer Allee (Berlin) 13.42742
#>       stop_lat
#> 1941   52.5448
#> 13818  52.5448
#> 13819  52.5448
#> 29841  52.5448
#> 29864  52.5448
#> 30014  52.5448
#> 32824  52.5448
#> 32843  52.5448
#> 1 transfers; trip duration = 00:21:48
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:12:42 with 1 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Zepernick, Inntaler Str.----
#>       duration ntransfers      stop_id                stop_name stop_lon
#> 7050  00:47:00          3 900000350173 Zepernick, Inntaler Str. 13.53415
#> 16233 00:48:00          2 750000328701 Zepernick, Inntaler Str. 13.53415
#> 16234 00:45:00          2 750000328702 Zepernick, Inntaler Str. 13.53415
#>       stop_lat
#> 7050  52.64832
#> 16233 52.64832
#> 16234 52.64832
#> 2 transfers; trip duration = 00:45:00
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:40:00 with 2 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: U Alt-Mariendorf----
#>       duration ntransfers      stop_id
#> 1338  00:25:48          3 900000070301
#> 24501 00:25:48          3 900000070701
#> 24502 00:25:48          3 900000070702
#> 26668 00:29:48          4 070101001134
#> 26697 00:29:48          4 070101001164
#> 28601 00:25:48          3 070101003370
#> 28625 00:25:48          3 070101003398
#> 28762 00:25:48          3 070101003558
#> 28772 00:25:48          3 070101003568
#> 28810 00:25:48          3 070101003615
#> 29390 00:25:48          3 070101004347
#> 32071 00:26:48          3 070201064901
#> 32072 00:23:48          2 070201064902
#> 36519 00:29:48          4 900000070703
#> 40139 00:25:48          3 070101002055
#>                                                  stop_name stop_lon stop_lat
#> 1338                             U Alt-Mariendorf (Berlin) 13.38746 52.43964
#> 24501       U Alt-Mariendorf (Berlin) [Bus Alt-Mariendorf] 13.38750 52.44023
#> 24502 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 26668             U Alt-Mariendorf [Busendstelle] (Berlin) 13.39113 52.43964
#> 26697             U Alt-Mariendorf [Busendstelle] (Berlin) 13.39113 52.43964
#> 28601       U Alt-Mariendorf (Berlin) [Bus Alt-Mariendorf] 13.38750 52.44023
#> 28625 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 28762 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 28772 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 28810 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 29390       U Alt-Mariendorf (Berlin) [Bus Alt-Mariendorf] 13.38750 52.44023
#> 32071                            U Alt-Mariendorf (Berlin) 13.38746 52.43964
#> 32072                            U Alt-Mariendorf (Berlin) 13.38746 52.43964
#> 36519             U Alt-Mariendorf [Busendstelle] (Berlin) 13.39113 52.43964
#> 40139 U Alt-Mariendorf (Berlin) [Bus Frieden-/Reißeckstr.] 13.38761 52.43897
#> 1 transfers; trip duration = 00:28:00
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:25:00 with 1 transfers
#>  ---> Okay, traveltimes at least as good as raptor
#> 
#> ----TO: Berlin, Hundsteinweg----
#>       duration ntransfers      stop_id            stop_name stop_lon stop_lat
#> 1342  00:31:00          3 900000070305 Berlin, Hundsteinweg  13.3901 52.43436
#> 27763 00:29:00          2 070101002381 Berlin, Hundsteinweg  13.3901 52.43436
#> 29275 00:31:00          3 070101004193 Berlin, Hundsteinweg  13.3901 52.43436
#> 2 transfers; trip duration = 00:34:00
#>  ---> Okay, traveltimes at least as good as gtfs_route
#> Raptor: 00:36:00 with 2 transfers
#>  ---> Okay, traveltimes at least as good as raptor

Created on 2021-01-26 by the reprex package (v0.3.0)

And they are all either faster, or have fewer transfers, or both. The only exception is one raptor travel time of 03:00 from Hbf to Gesundbrunnen, which is obviously ridiculous.


That all looks good to me now, but I wouldn't be surprised if you find a few quirks :smirk: The problem was this:

If one trip goes A -> B -> C then transfers to E for E -> F, then the last leg (E -> F) could be updated by a potentially slower trip A -> B transfer to D for D -> E -> F, because the timetable line for E -> F connected with an identical trip number for previous stage D -> E. Thus E -> F did not by default record the fastest trip. The above commit ensures those replacements are only made if mimimise_transfers is TRUE, and if the replacement either has fewer transfers, or is faster, or both.