UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
80 stars 17 forks source link

gtfs_traveltimes output #99

Closed viajerus closed 1 year ago

viajerus commented 1 year ago

Hi, I have a question regarding the gtfs_traveltimes() output. I would like to identify the stations that are reachable in 45 minutes (2400 seconds). My code works flawless.

gtfs <- 
  gtfs_transfer_table(
    gtfs,
    d_limit = 200,
    min_transfer_time = 240,
    network = NULL,
    network_times = FALSE,
    quiet = FALSE
  )

from <- "429511"
start_times <- 12 * 3600 + c (0, 60) * 60 # 8:00-9:00

res <- gtfs_traveltimes (gtfs, from, start_times, minimise_transfers=TRUE, from_is_id = T, max_traveltime=2400, day = "monday")

res$duration <- as.numeric(hms(res$duration))

The station ID is 429511 (Schriesheimer Hof). When I open the output, I can see that the station that I used as a starting point (from) is also included in the output, it takes 60 seconds to travel from Schriesheimer Hof to Schriesheimer Hof. I can also see that the first "journey" begins two minutes before than the other trips.

Bildschirmfoto 2023-03-26 um 22 43 29

Is it related to the arrival time of the bus? Is there a way to leave the origin station out?

Best, Daniel

mpadge commented 1 year ago

Hard to definitively answer with any way of reproducing your results, but please note that station names are generally not unique in GTFS feeds. So every time you specify "Schreisheimer Hof" as a starting station, the algorithm will give shortest travel times to all other stops, including all others with same name. GTFS requires every platform or bus bay or whatever to be a unique stop, even where they all the same name. So the destination you see in the result will merely share the same name, but will have a different ID. And depending on how you set up the feed and called the algorithm, travel times of 60 seconds are likely the specified "transfer times" in the feed itself for walking between those stops.

Those can't sensibly be pre-filtered, as you can never know in advance whether useful services might depart from such stops. Nor can they be filtered as end points in cases like yours, because feeds may have multiple stops with identical names yet in different locations, so name-based filtering is also not possible. And this is only a guess because I can't reproduce your results, but the reason that connection departs two minutes earlier is likely because it is a simple transfer-by-foot, so can happen at any time. Although I don't why that would then give 12:29, when you've requested starts from 12:00?

The algorithm should filter out any destinations which are simple walking transfers from the start station, so I'm not sure why that appears in your result, but without any way of reproducing results I can't really say more. Feel free to simply close this issue is my response make sense to you, or else please provide a full reprex including code to download GTFS feed if you would like me to investigate any futher. Thanks

viajerus commented 1 year ago

Hi, thank you for your answer! This is my full reprex (including code to download gtfs feed, it's a small file).

library(gtfsrouter)
library(sf)
#> Linking to GEOS 3.11.0, GDAL 3.5.3, PROJ 9.1.0; sf_use_s2() is TRUE
library(data.table)
library(ggplot2)
library(leaflet)
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:data.table':
#> 
#>     between, first, last
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(lubridate)
#> 
#> Attaching package: 'lubridate'
#> The following objects are masked from 'package:data.table':
#> 
#>     hour, isoweek, mday, minute, month, quarter, second, wday, week,
#>     yday, year
#> The following objects are masked from 'package:base':
#> 
#>     date, intersect, setdiff, union
library(reprex)

url <- "https://gtfs-sandbox-dds.rnv-online.de/latest/gtfs.zip"

# create a temporary directory
td = tempdir()

# create the placeholder file
tf = tempfile(tmpdir=td, fileext=".zip")

# download into the placeholder file
download.file(url, tf)

#extract gtfs file
gtfs <- extract_gtfs (tf)
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> Warning: This feed contains no transfers.txt 
#>   A transfers.txt table may be constructed with the 'gtfs_transfer_table' function
#> ▶ Extracting GTFS feed
#> ✔ Extracted GTFS feed
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds

#get transfer table
gtfs <- 
  gtfs_transfer_table(
    gtfs,
    d_limit = 200,
    min_transfer_time = 240,
    network = NULL,
    network_times = FALSE,
    quiet = FALSE
  )

#set origin using stop id
from <- "429511"
start_times <- 12 * 3600 + c (0, 60) * 60 # 8:00-9:00

#calculate traveltimes
res <- gtfs_traveltimes (gtfs, from, start_times, minimise_transfers=TRUE, from_is_id = T, max_traveltime=600, day = "monday")

res$duration <- as.numeric(hms(res$duration))

#format
res <- res %>% arrange(duration)

res
#>   start_time duration ntransfers stop_id          stop_name stop_lon stop_lat
#> 1   12:29:00       60          0  429511  Schriesheimer Hof 8.754298 49.47032
#> 2   12:31:00      120          0  131811  Langer Kirschbaum 8.751244 49.45908
#> 3   12:31:00      420          0  131411     Heidebuckelweg 8.754154 49.44983
#> 4   12:31:00      450          0  132611 Schweizertalstraße 8.753274 49.44823
#> 5   12:31:00      480          0  131911              Löwen 8.753714 49.44410
#> 6   12:31:00      600          0  131011        Grüner Baum 8.751989 49.43755

Created on 2023-03-27 with reprex v2.0.2.9000

A you can see, the station "Schriesheimer Hof" listed in the output has the same stop_id as the departure station. There are in the entire feed just two stations called "Schriesheimer Hof", as you can see below.

Bildschirmfoto 2023-03-27 um 12 21 49 Bildschirmfoto 2023-03-27 um 12 22 00 Bildschirmfoto 2023-03-27 um 12 24 16

So yes, it's interesting that the same start station appears in the output,

mpadge commented 1 year ago

Thanks! That definitely shouldn't happen. I'll look into it and get back to you.

mpadge commented 1 year ago

Thanks @viajerus, that was definitely a bug. The above commit fixes, so you should now see this:

library (gtfsrouter)
packageVersion ("gtfsrouter")
#> [1] '0.0.5.147'
url <- "https://gtfs-sandbox-dds.rnv-online.de/latest/gtfs.zip"
td = tempdir()
tf = tempfile(tmpdir=td, fileext=".zip")
download.file(url, tf)
gtfs <- extract_gtfs (tf)
#> ▶ Unzipping GTFS archive✔ Unzipped GTFS archive
#> Warning: This feed contains no transfers.txt 
#>   A transfers.txt table may be constructed with the 'gtfs_transfer_table' function
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds

gtfs <- gtfs_transfer_table(gtfs, d_limit = 200, min_transfer_time = 240,
    network = NULL, network_times = FALSE, quiet = FALSE)

from <- "429511"
start_times <- 12 * 3600 + c (0, 60) * 60 # 8:00-9:00

res <- gtfs_traveltimes (gtfs, from, start_times, minimise_transfers=TRUE, from_is_id = T, max_traveltime=600, day = "monday")
res$duration <- as.numeric(lubridate::hms(res$duration))
res %>% dplyr::arrange(duration)
#>   start_time duration ntransfers stop_id          stop_name stop_lon stop_lat
#> 1   12:31:00      120          0  131811  Langer Kirschbaum 8.751244 49.45908
#> 2   12:31:00      420          0  131411     Heidebuckelweg 8.754154 49.44983
#> 3   12:31:00      450          0  132611 Schweizertalstraße 8.753274 49.44823
#> 4   12:31:00      480          0  131911              Löwen 8.753714 49.44410
#> 5   12:31:00      600          0  131011        Grüner Baum 8.751989 49.43755

Created on 2023-03-27 with reprex v2.0.2

viajerus commented 1 year ago

Thank you very much! :)

mpadge commented 1 year ago

Thanks for helping to improve the code!