UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
80 stars 17 forks source link

Transfers counted incorrectly #88

Open szaboildi opened 2 years ago

szaboildi commented 2 years ago

Hi! I found a bug with how transfers are counted. I am using the new Sept 2021 - Dec 2021 HVV GTFS found here. I am using gtfsrouter 0.0.5.042 with R version 4.0.5 When I read this file in

library(gtfsrouter)
library(data.table)

url <- "http://daten.transparenz.hamburg.de/Dataport.HmbTG.ZS.Webservice.GetRessource100/GetRessource100.svc/90a29c65-d9c5-4281-a45a-e12bf19a18fb/Upload__HVV_Rohdaten_GTFS_Fpl_20210903.zip"
file <- tempfile()
download.file(url, file, quiet = TRUE)

from_trip <- "S Hamburg Airport"
to_trip <- "Langobardenweg"
start_time_trip <- c(8, 0)
day_trip <- 3 # Tuesday

ham_gtfs <- extract_gtfs(file)
# v Unzipped GTFS archive  
# v Extracted GTFS feed 
# v Converted stop times to seconds 
# v Converted transfer times to seconds

gtfs_route(ham_gtfs, from_trip, to_trip, start_time_trip, day_trip, max_transfers = 0)
#    route_name                      trip_name                      stop_name arrival_time departure_time
# 1         292 Lufthansa-Basis (Haupteingang)              S Hamburg Airport     08:03:00       08:03:00
# 2         292 Lufthansa-Basis (Haupteingang)                Zeppelinstraße     08:05:00       08:05:00
# 3         292 Lufthansa-Basis (Haupteingang)                Röntgenstraße     08:06:00       08:06:00
# 4          23               U Niendorf Markt Lufthansa-Basis (Haupteingang)     08:12:00       08:12:00
# 5          23               U Niendorf Markt                   Paeplowstieg     08:14:00       08:14:00
# 6          23               U Niendorf Markt             Am Licentiatenberg     08:15:00       08:15:00
# 7          23               U Niendorf Markt                     Spreenende     08:17:00       08:17:00
# 8          23               U Niendorf Markt                    Warnckesweg     08:19:00       08:19:00
# 9          23               U Niendorf Markt             Stavenhagenstraße     08:20:00       08:20:00
# 10         23               U Niendorf Markt                      Bekstück     08:21:00       08:21:00
# 11         23               U Niendorf Markt             Niendorfer Straße     08:22:00       08:22:00
# 12         23               U Niendorf Markt               Vogt-Cordes-Damm     08:24:00       08:24:00
# 13        191               Krohnstiegtunnel               U Niendorf Markt     08:35:00       08:35:00
# 14        191               Krohnstiegtunnel               U Niendorf Markt     08:36:00       08:36:00
# 15        191               Krohnstiegtunnel             Fuhlsbütteler Weg     08:38:00       08:38:00
# 16        191               Krohnstiegtunnel                   Helvetierweg     08:39:00       08:39:00
# 17        191               Krohnstiegtunnel                     Krähenweg     08:40:00       08:40:00
# 18        191               Krohnstiegtunnel                 Langobardenweg     08:42:00       08:42:00

This issue persists even when gtfs_route() is run on a timetable and not the gtfs itself.

ham_ttable <- gtfs_timetable(ham_gtfs, day)
gtfs_route(ham_ttable , from_trip, to_trip, start_time_trip, day_trip, max_transfers = 0)

#    route_name                      trip_name                      stop_name arrival_time departure_time
# 1         292 Lufthansa-Basis (Haupteingang)              S Hamburg Airport     08:03:00       08:03:00
# 2         292 Lufthansa-Basis (Haupteingang)                Zeppelinstraße     08:05:00       08:05:00
# 3         292 Lufthansa-Basis (Haupteingang)                Röntgenstraße     08:06:00       08:06:00
# 4          23               U Niendorf Markt Lufthansa-Basis (Haupteingang)     08:12:00       08:12:00
# 5          23               U Niendorf Markt                   Paeplowstieg     08:14:00       08:14:00
# 6          23               U Niendorf Markt             Am Licentiatenberg     08:15:00       08:15:00
# 7          23               U Niendorf Markt                     Spreenende     08:17:00       08:17:00
# 8          23               U Niendorf Markt                    Warnckesweg     08:19:00       08:19:00
# 9          23               U Niendorf Markt             Stavenhagenstraße     08:20:00       08:20:00
# 10         23               U Niendorf Markt                      Bekstück     08:21:00       08:21:00
# 11         23               U Niendorf Markt             Niendorfer Straße     08:22:00       08:22:00
# 12         23               U Niendorf Markt               Vogt-Cordes-Damm     08:24:00       08:24:00
# 13        191               Krohnstiegtunnel               U Niendorf Markt     08:35:00       08:35:00
# 14        191               Krohnstiegtunnel               U Niendorf Markt     08:36:00       08:36:00
# 15        191               Krohnstiegtunnel             Fuhlsbütteler Weg     08:38:00       08:38:00
# 16        191               Krohnstiegtunnel                   Helvetierweg     08:39:00       08:39:00
# 17        191               Krohnstiegtunnel                     Krähenweg     08:40:00       08:40:00
# 18        191               Krohnstiegtunnel                 Langobardenweg     08:42:00       08:42:00

This issue is there in the traveltime table as well

ttimes <- gtfs_traveltimes(ham_gtfs, from_trip,
                           start_time_limits = c(8 * 60 * 60, 9 * 60 * 60), 
                           day = day, max_traveltime = 60 * 60, max)
ttimes[ttimes$stop_name == "Langobardenweg" & ttimes$ntransfers == 0,]
#      start_time duration ntransfers                stop_id      stop_name stop_lon stop_lat
# 1317   08:03:00 00:39:00          0 de:02000:91028::910093 Langobardenweg 9.966597 53.63475

Is this a bug or could this be an issue with the particular GTFS somehow?

mpadge commented 2 years ago

Hi @szaboildi, sorry i somehow completely missed this issue. I'm not sure what you mean with the bug here; could you please explain a bit more? You show journeys with transfers, and they both match. Can you please explain why this is different from what you expect? Thanks

szaboildi commented 2 years ago

Hi @mpadge , thanks for getting back to me! The issue is basically that this trip shows up as requiring 0 transfers (the max_transfers parameter is set to 0 in the gtfs_route() function at the top and thentransfers column in ttimes at the bottom is also 0) even though when details of the trip are accessed, it seems like it involves 3 routes (292, 23, 191-- in the route_name column) -- which actually matches the fastest route on google maps.

(The output of the gtfs_traveltimes() function in this case is likely the same trip that the gtfs_route() function finds, because the starting time and duration matches.)

mpadge commented 2 years ago

But the documentation for the gtfs_route function explains the max_transfers parameter like this:

If not NA, specify a desired maximum number of transfers for the route (including but not exceeding this number). This parameter may be used to generate alternative routes with fewer transfers, although actual numbers of transfers may still exceed this number if a value is specified which exceeds the minimal feasible number of transfers.

Setting max_transfers = N only has any effect if it is possible to travel along a route with that number of transfers, in which case it will ensure that trips with any greater number of transfers are not returned, even where these may be faster. In your case, 0 transfers are simply impossible, and you get by default the minimal number which is 2.

mpadge commented 2 years ago

I probably should update that documentation to change

if a value is specified which exceeds the minimal feasible number of transfers.

to the correct version of

if the value specified is less than the minimal feasible number of transfers.

szaboildi commented 2 years ago

Thank you for the explanation, that explains the gtfs_route() part of the problem perfectly! Why does the gtfs_traveltimes() function return this trip with 0 transfers though? Shouldn't the query in the last lineline (below) result in an empty df?

ttimes <- gtfs_traveltimes(ham_gtfs, from_trip, start_time_limits = c(8 * 60 * 60, 9 * 60 * 60), day = day, max_traveltime = 60 * 60, max) ttimes[ttimes$stop_name == "Langobardenweg" & ttimes$ntransfers == 0,]

mpadge commented 2 years ago

Yes, that gtfs_travetimes result does possibly reflect a bug. I'll keep this issue open and investigate that further soon. Thanks

mpadge commented 1 year ago

@szaboildi Sorry that it's taken so long to get back to this. The above commit fixes the bug you uncovered, and you should now see this:

library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.5.91'

url <- "http://daten.transparenz.hamburg.de/Dataport.HmbTG.ZS.Webservice.GetRessource100/GetRessource100.svc/90a29c65-d9c5-4281-a45a-e12bf19a18fb/Upload__HVV_Rohdaten_GTFS_Fpl_20210903.zip"
file <- "./feeds/HVV.zip"
if (!file.exists (file)) {
    download.file(url, file, quiet = FALSE)
|

from_trip <- "S Hamburg Airport"
to_trip <- "Langobardenweg"
start_time_trip <- 8 * 3600
day_trip <- 3 # Tuesday

ham_gtfs <- extract_gtfs(file)
#> ▶ Unzipping GTFS archive✔ Unzipped GTFS archive  
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
ham_gtfs <- gtfs_transfer_table (ham_gtfs)
route <- gtfs_route(ham_gtfs, from_trip, to_trip, start_time_trip, day_trip, max_transfers = 0)
# same as above: lists 2 transfers, and takes 39 minutes

ttimes <- gtfs_traveltimes(ham_gtfs, from_trip,
                           start_time_limits = c(8 * 60 * 60, 9 * 60 * 60), 
                           day = day_trip, max_traveltime = 60 * 60)
ttimes[ttimes$stop_name == to_trip, ]
#>      start_time duration ntransfers                stop_id      stop_name
#> 1163   08:03:00 00:39:00          2 de:02000:91028::910093 Langobardenweg
#> 1168   08:03:00 00:36:00          4 de:02000:91028::910098 Langobardenweg
#>      stop_lon stop_lat
#> 1163 9.966597 53.63475
#> 1168 9.966927 53.63527

Created on 2022-08-16 by the reprex package (v2.0.1)

That now lists the correct number of transfers, along with an additional possibility to get there 3 minutes quicker with 4 transfers.

Description of bug

The traveltimes algorithm was not counting "implicit" transfers to different services (that is, "trip_id" values) from the same platform ("stop_id" values). The above commit identifies these changes of service, and appropriately increments the aggregated number of transfers.


I now have to re-open this issue, because the gtfs_route() function then reveals another bug here. Setting max_transfers >= 4 should give the 4-transfer route, but instead returns this incorrect result:

library (gtfsrouter)
url <- "http://daten.transparenz.hamburg.de/Dataport.HmbTG.ZS.Webservice.GetRessource100/GetRessource100.svc/90a29c65-d9c5-4281-a45a-e12bf19a18fb/Upload__HVV_Rohdaten_GTFS_Fpl_20210903.zip"
file <- "./feeds/HVV.zip"
if (!file.exists (f)) {
    download.file(url, file, quiet = FALSE)
}

from_trip <- "S Hamburg Airport"
to_trip <- "Langobardenweg"
start_time_trip <- 8 * 3600
day_trip <- 3 # Tuesday

ham_gtfs <- extract_gtfs(file)
#> ▶ Unzipping GTFS archive✔ Unzipped GTFS archive  
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
ham_gtfs <- gtfs_transfer_table (ham_gtfs)
gtfs_route(ham_gtfs, from_trip, to_trip, start_time_trip, day_trip, max_transfers = 4)
#>    route_name         trip_name                       stop_name arrival_time
#> 1          S1             Wedel     Hamburg Airport (Flughafen)     08:03:00
#> 2          S1             Wedel                        Ohlsdorf     08:06:00
#> 3          U1 Norderstedt Mitte                        Ohlsdorf     08:17:00
#> 4          U1 Norderstedt Mitte                   Klein Borstel     08:18:00
#> 5          U1 Norderstedt Mitte                     Fuhlsbüttel     08:19:00
#> 6          U1 Norderstedt Mitte                Fuhlsbüttel Nord     08:21:00
#> 7          U1 Norderstedt Mitte                Langenhorn Markt     08:23:00
#> 8         193        U Garstedt U Langenhorn Markt (Krohnstieg)     08:26:00
#> 9         193        U Garstedt                    Rittmerskamp     08:27:00
#> 10        193        U Garstedt                      Samlandweg     08:28:00
#> 11        193        U Garstedt                   Wrangelkoppel     08:29:00
#> 12        193        U Garstedt                      Ermlandweg     08:30:00
#> 13        191        Grothwisch                Krohnstiegtunnel     08:34:00
#> 14        191        Grothwisch                    Sperlingsweg     08:36:00
#> 15        191        Grothwisch                        Moorrand     08:37:00
#> 16        191        Grothwisch                   Grotkoppelweg     08:38:00
#> 17        191        Grothwisch                  Langobardenweg     08:39:00
#> 18       <NA>        (transfer)               S Hamburg Airport     00:44:00
#>    departure_time
#> 1        08:03:00
#> 2        08:09:00
#> 3        08:17:00
#> 4        08:18:00
#> 5        08:19:00
#> 6        08:21:00
#> 7        08:23:00
#> 8        08:26:00
#> 9        08:27:00
#> 10       08:28:00
#> 11       08:29:00
#> 12       08:30:00
#> 13       08:34:00
#> 14       08:36:00
#> 15       08:37:00
#> 16       08:38:00
#> 17       08:39:00
#> 18           <NA>

That seems to be just a final error with the returned order of the stops along the route.

mpadge commented 1 year ago

That last commit removes any transfers from being traced from the set of start stations, and so fixes the bug here. Running gtfs_traveltimes() still gives the same answer as above, with 2 transfers for 39 minutes, or 4 for 36, but the latter is no longer the incorrectly-timed version that started with a transfer. The gtfs_route() function still gives that, but that will be updated via #71, to allow full tracing of each connection listed in gtfs_traveltimes().

szaboildi commented 1 year ago

Works as described, thank you!

jh0ker commented 1 year ago

@mpadge As I mentioned here, I think I've found a wrong behavior in the gtfs_traveltimes function and it looks like it's part of this bug here. It's easily demonstrated with your example you posted before:

@szaboildi Sorry that it's taken so long to get back to this. The above commit fixes the bug you uncovered, and you should now see this:

library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.5.91'

url <- "http://daten.transparenz.hamburg.de/Dataport.HmbTG.ZS.Webservice.GetRessource100/GetRessource100.svc/90a29c65-d9c5-4281-a45a-e12bf19a18fb/Upload__HVV_Rohdaten_GTFS_Fpl_20210903.zip"
file <- "./feeds/HVV.zip"
if (!file.exists (file)) {
    download.file(url, file, quiet = FALSE)
|

from_trip <- "S Hamburg Airport"
to_trip <- "Langobardenweg"
start_time_trip <- 8 * 3600
day_trip <- 3 # Tuesday

ham_gtfs <- extract_gtfs(file)
#> ▶ Unzipping GTFS archive✔ Unzipped GTFS archive  
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
ham_gtfs <- gtfs_transfer_table (ham_gtfs)
route <- gtfs_route(ham_gtfs, from_trip, to_trip, start_time_trip, day_trip, max_transfers = 0)
# same as above: lists 2 transfers, and takes 39 minutes

ttimes <- gtfs_traveltimes(ham_gtfs, from_trip,
                           start_time_limits = c(8 * 60 * 60, 9 * 60 * 60), 
                           day = day_trip, max_traveltime = 60 * 60)
ttimes[ttimes$stop_name == to_trip, ]
#>      start_time duration ntransfers                stop_id      stop_name
#> 1163   08:03:00 00:39:00          2 de:02000:91028::910093 Langobardenweg
#> 1168   08:03:00 00:36:00          4 de:02000:91028::910098 Langobardenweg
#>      stop_lon stop_lat
#> 1163 9.966597 53.63475
#> 1168 9.966927 53.63527

Created on 2022-08-16 by the reprex package (v2.0.1)

That now lists the correct number of transfers, along with an additional possibility to get there 3 minutes quicker with 4 transfers.

This result still looks correct when running it with the latest master version (0.0.5.123), however if you specify gtfs_traveltimes(..., minimise_transfers = TRUE), you can see the transfers are still being calculated wrong:

ttimes_mintransfer <- gtfs_traveltimes(ham_gtfs, from_trip,
                           start_time_limits = c(8 * 60 * 60, 9 * 60 * 60), 
                           day = day_trip, max_traveltime = 60 * 60, 
                           minimise_transfers = TRUE)
ttimes_mintransfer[ttimes_mintransfer$stop_name == to_trip, ]
#>      start_time duration ntransfers                stop_id      stop_name stop_lon stop_lat
#> 1353   08:33:00 00:39:00          0 de:02000:91028::910093 Langobardenweg 9.966597 53.63475
#> 1358   08:33:00 00:45:00          1 de:02000:91028::910098 Langobardenweg 9.966927 53.63527

Let me know if you need any more info or anything. Cheers!

mpadge commented 1 year ago

Shall do @jh0ker, thanks! I recently also noticed odd behaviour when calling with minimise_transfers = TRUE.I'll re-open now and hopefully fix asap.