UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
82 stars 17 forks source link

gtfs_traveltimes bug when setting max_traveltime #116

Closed FlxPo closed 8 months ago

FlxPo commented 8 months ago

I encountered a bug when using gtfs_traveltimes on the GTFS feed of the Ile-de-France region. Everything works with a really large max_traveltime, when all stops can be reached. However whenever I use any value that limits this number of reachable stops, I get travel times that are way above max_traveltime, negative start_times (always -1:59:59) and NA n_transfers.

Do you know what could be wrong ?

It might be something about this GTFS feed, because this is the first time I encounter the problem. I tried recomputing transfers with gtfs_transfer_table and fixing the "infinite speed" hops between stations, but I still get the same results.

library(gtfsrouter)
library(data.table)

gtfs_file_path <- "D:/data/mobility/data/gtfs/413988ed-d340-467b-8be2-7b999fcd207a_Horaires au format GTFS.zip"

gtfs <- extract_gtfs(gtfs_file_path)
#> ▶ Unzipping GTFS archive✔ Unzipped GTFS archive  
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds

gtfs <- gtfs_timetable(gtfs, day = "wednesday")

sample_stop_id <- sample(gtfs$stop_times$stop_id, 1)
max_traveltime <- 30*60

tt <- gtfs_traveltimes(
  gtfs = gtfs,
  from = sample_stop_id,
  from_is_id = TRUE,
  start_time_limits = c(3600*7.5, 3600*8.5),
  max_traveltime = max_traveltime,
  minimise_transfers = TRUE
)

tt <- as.data.table(tt)
tt[, duration := as.numeric(lubridate::hms(duration))]
tt <- tt[order(duration)]

tt[duration > max_traveltime]
#>        start_time duration ntransfers     stop_id
#>            <char>    <num>      <int>      <char>
#>     1:   -1:59:59    28861         NA   IDFM:8227
#>     2:   -1:59:59    28861         NA IDFM:426669
#>     3:   -1:59:59    28861         NA  IDFM:22596
#>     4:   -1:59:59    28921         NA  IDFM:25183
#>     5:   -1:59:59    28921         NA  IDFM:22702
#>    ---                                           
#> 24472:   -1:59:59    96301         NA IDFM:486432
#> 24473:   -1:59:59    96361         NA IDFM:486242
#> 24474:   -1:59:59    97441         NA  IDFM:19823
#> 24475:   -1:59:59    97741         NA   IDFM:8488
#> 24476:   -1:59:59   104881         NA  IDFM:18749
#>                                stop_name stop_lon stop_lat
#>                                   <char>    <num>    <num>
#>     1:                    Claude Bernard 2.578133 48.85457
#>     2:         Flandres - Dunkerque 1940 2.581532 48.85057
#>     3:                      Simone Bigot 2.536854 48.86054
#>     4:                  Emile Cossonneau 2.551479 48.87476
#>     5:                 Rond-Point Thiers 2.518008 48.89979
#>    ---                                                    
#> 24472:        Gare De Boissy-Saint-Léger 2.505408 48.75297
#> 24473:                   Poissy Gare Sud 2.041819 48.93261
#> 24474:  Gare d'Evry-Courcouronnes Centre 2.428402 48.62541
#> 24475: Gare du Bras de Fer Evry Génopole 2.450716 48.62345
#> 24476:                 Place de la Boule 2.200371 48.88750
Created on 2024-01-31 with reprex v2.1.0
mpadge commented 8 months ago

Coincidentally, i'm working on the same feed today, so will easily be able to check it out for you.

mpadge commented 8 months ago

Thanks @FlxPo, that was just a failure to clean up data prior to returning. All fixed now, and you should see this:

library (gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.1.2.6'
library(data.table)

gtfs_file_path <- "<path>"
gtfs <- extract_gtfs(gtfs_file_path)
gtfs <- gtfs_timetable(gtfs, day = "wednesday")

set.seed(1L)
sample_stop_id <- sample(gtfs$stop_times$stop_id, 1)
max_traveltime <- 30*60

tt <- gtfs_traveltimes(
  gtfs = gtfs,
  from = sample_stop_id,
  from_is_id = TRUE,
  start_time_limits = c(3600*7.5, 3600*8.5),
  max_traveltime = max_traveltime,
  minimise_transfers = TRUE
)

tt <- as.data.table(tt)
table (tt$start_time)
#> 
#> 07:31:00 07:32:00 07:36:00 07:43:00 07:47:00 07:48:00 07:50:00 07:52:00 
#>       36       11       17       12       10        3        8        8 
#> 07:57:00 08:18:00 
#>        1        1
tt
#>      start_time duration ntransfers     stop_id                      stop_name
#>   1:   07:32:00 00:06:00          0  IDFM:23722 La Défense - Métro-RER-Tramway
#>   2:   07:31:00 00:11:00          1  IDFM:23435                  Les Fauvelles
#>   3:   07:50:00 00:13:00          1  IDFM:18733                        Palissy
#>   4:   08:18:00 00:26:00          0  IDFM:24585        Conservatoire-Pressensé
#>   5:   07:31:00 00:14:00          2  IDFM:25641      La Défense (Grande Arche)
#>  ---                                                                          
#> 103:   07:32:00 00:19:00          2  IDFM:24333                        Marceau
#> 104:   07:36:00 00:25:00          2  IDFM:26083        Bagatelle - Pré Catelan
#> 105:   07:31:00 00:16:00          2 IDFM:427918            Faubourg de l'Arche
#> 106:   07:31:00 00:13:00          1  IDFM:23436                    Charlebourg
#> 107:   07:36:00 00:22:00          2  IDFM:23733                  Paul Lafargue
#>      stop_lon stop_lat
#>   1: 2.239144 48.89203
#>   2: 2.239631 48.90263
#>   3: 2.228742 48.88441
#>   4: 2.237202 48.87568
#>   5: 2.237384 48.89306
#>  ---                  
#> 103: 2.246721 48.90005
#> 104: 2.248712 48.86695
#> 105: 2.240079 48.89717
#> 106: 2.238467 48.90760
#> 107: 2.246275 48.88561

Created on 2024-02-01 with reprex v2.1.0

The negative start times were all from converting -1 to HMS format, and the corresponding travel times were garbage. All of those are now filtered out from the result, leaving only the travel times for actually reachable stops.

Thanks for your awesome debugging help!

FlxPo commented 8 months ago

Well thank you very much for answering and fixing this !

Big fan of your work on gtfsrouter and dodgr, that are both tools that both "just work" out of the box and are incredibly fast.

FlxPo commented 2 months ago

Hi @mpadge, do you have an idea of when this fix be available on CRAN ? Building from github is not always possible for some users...

mpadge commented 2 months ago

Sorry @FlxPo , I'll get a new version up asap, bit likely not until early Aug.