UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
81 stars 17 forks source link

Ensure isochrones have latest possible start time for any given end time #49

Closed mpadge closed 3 years ago

mpadge commented 3 years ago

That is currently not done, so isochrone values do not necessarily reflect the fastest possible trip to each end point

mpadge commented 3 years ago

@AlexandraKapp The above commit changes isochrone calculation so that the value at each end point is calculated relative to the start time of the actual journey that led to that end point, rather than previous version in which all times were relative to a common time defined as first trip leaving from after specified time.

Previous version

image

New Version

image

The auto-zoom means they're not quite overlaid, but you can see that the new version extends further, because the end points are only stopped once they exceed the specified isochrone value after the start of those actual journeys.

AlexandraKapp commented 3 years ago

looks good! How would you know additionally the exact duration for each stop? Would it make sense to change the column earliest_arrival to duration in mid_pointsand end_points? Or add another column departure_time?

mpadge commented 3 years ago

The preceding 2 commits now give this:

library (gtfsrouter)
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable (gtfs, day = "Monday")
from <- "Berlin, Amalienstr"
start_time <- 11 * 3600
end_time <- start_time + 15 * 60
r <- gtfs_isochrone (gtfs, from = from, start_time = start_time, end_time = end_time)
#> Loading required namespace: geodist
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.2

r$end_points
#> Simple feature collection with 26 features and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.2782 ymin: 52.40379 xmax: 13.3721 ymax: 52.45409
#> geographic CRS: WGS 84
#> First 10 features:
#>                                        stop_name      stop_id departure
#> 1                          Berlin, Karwendelstr. 070101002873  11:06:00
#> 2                            Berlin, Breite Str. 070101004111  11:04:00
#> 3                            Berlin, Greinerstr. 070101003027  11:04:00
#> 4                           Berlin, Herwarthstr. 070101002665  11:06:00
#> 5                    Teltow, Ev. Diakonissenhaus 070101007462  11:06:00
#> 6                               Berlin, Feldstr. 070101001321  11:06:00
#> 7                 Berlin, Siemensstr./Halskestr. 070101000144  11:04:00
#> 8                   Berlin, Friedrichrodaer Str. 070101000710  11:04:00
#> 9             Berlin, Unter den Eichen/Drakestr. 070101002993  11:06:00
#> 10 S Lichterfelde Süd (Berlin) [Bus Réaumurstr.] 070101000076  11:06:00
#>     arrival duration transfers                  geometry
#> 1  11:15:00 00:09:00         2 POINT (13.30679 52.43304)
#> 2  11:18:00 00:14:00         0 POINT (13.32904 52.45409)
#> 3  11:18:00 00:14:00         1  POINT (13.3721 52.43668)
#> 4  11:20:00 00:14:00         3 POINT (13.31808 52.43209)
#> 5  11:20:00 00:14:00         0  POINT (13.2782 52.40379)
#> 6  11:20:30 00:14:30         1 POINT (13.30354 52.41569)
#> 7  11:14:00 00:10:00         1   POINT (13.3428 52.4471)
#> 8  11:18:00 00:14:00         2 POINT (13.35879 52.42044)
#> 9  11:20:00 00:14:00         1 POINT (13.29339 52.44588)
#> 10 11:19:00 00:13:00         1  POINT (13.3114 52.41099)

Created on 2020-10-26 by the reprex package (v0.3.0)

mpadge commented 3 years ago

Those commits convert the whole calculation of isochrones from tracing absolute times to tracing a vector of elapsed_time values for each station, calculated relative to the first departure from the nominated from point(s) to that station. The first arriving trip to each station is recorded, but these are also replaced by any subsequent ones with overall lower elapsed_time values, ensuring that latest possible start time will always be chosen for any given end time. The final absolute times in the return object correspond to the first trip with minimal elapsed time, ensuring that, for a bunch of equivalent trips the first one will always be returned.

library (gtfsrouter)
gtfs <- extract_gtfs("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable (gtfs, day = "Monday")
from <- "Berlin, Amalienstr"

start_time <- 11 * 3600 + 0 * 60
end_time <- start_time + 10 * 60
r <- gtfs_isochrone (gtfs, from = from, start_time = start_time, end_time = end_time)
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.2
r$end_points
#> Simple feature collection with 9 features and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.30214 ymin: 52.41436 xmax: 13.35007 ymax: 52.44425
#> geographic CRS: WGS 84
#>                          stop_name      stop_id departure  arrival duration
#> 1            Berlin, Karwendelstr. 070101000397  11:06:00 11:15:00 00:09:00
#> 2       Berlin, Ostpreußendamm Süd 070101001322  11:06:00 11:15:00 00:09:00
#> 3         S Osdorfer Str. (Berlin) 070101002577  11:06:00 11:15:00 00:09:00
#> 4         S Osdorfer Str. (Berlin) 060064256621  11:06:00 11:15:42 00:09:42
#> 5               Berlin, Grabenstr. 070101000388  11:06:00 11:15:00 00:09:00
#> 6            Berlin, Stanzer Zeile 070101003072  11:06:00 11:15:00 00:09:00
#> 7             Berlin, Eiswaldtstr. 070101000881  11:04:00 11:13:00 00:09:00
#> 8 Berlin, Leonorenstr./Siemensstr. 070101004143  11:04:00 11:12:00 00:08:00
#> 9      Berlin, Paul-Schneider-Str. 070101000705  11:04:00 11:13:00 00:09:00
#>   transfers                  geometry
#> 1         1 POINT (13.30679 52.43304)
#> 2         0 POINT (13.30213 52.41436)
#> 3         1 POINT (13.31386 52.41837)
#> 4         1 POINT (13.31386 52.41837)
#> 5         1 POINT (13.32649 52.42126)
#> 6         1 POINT (13.33818 52.41664)
#> 7         1 POINT (13.34704 52.42858)
#> 8         0 POINT (13.33899 52.44425)
#> 9         1 POINT (13.35007 52.43425)

start_time <- 11 * 3600 + 5 * 60
end_time <- start_time + 10 * 60
r <- gtfs_isochrone (gtfs, from = from, start_time = start_time, end_time = end_time)
r$end_points
#> Simple feature collection with 8 features and 6 fields
#> geometry type:  POINT
#> dimension:      XY
#> bbox:           xmin: 13.30214 ymin: 52.41436 xmax: 13.36118 ymax: 52.4473
#> geographic CRS: WGS 84
#>                          stop_name      stop_id departure  arrival duration
#> 1 Berlin, Leonorenstr./Siemensstr. 070101004143  11:09:00 11:18:00 00:09:00
#> 2            S Attilastr. (Berlin) 070101003427  11:09:00 11:18:00 00:09:00
#> 3            Berlin, Karwendelstr. 070101000397  11:06:00 11:15:00 00:09:00
#> 4       Berlin, Ostpreußendamm Süd 070101001322  11:06:00 11:15:00 00:09:00
#> 5         S Osdorfer Str. (Berlin) 070101002577  11:06:00 11:15:00 00:09:00
#> 6         S Osdorfer Str. (Berlin) 060064256621  11:06:00 11:15:42 00:09:42
#> 7               Berlin, Grabenstr. 070101000388  11:06:00 11:15:00 00:09:00
#> 8            Berlin, Stanzer Zeile 070101003072  11:06:00 11:15:00 00:09:00
#>   transfers                  geometry
#> 1         1 POINT (13.33899 52.44425)
#> 2         0  POINT (13.36118 52.4473)
#> 3         1 POINT (13.30679 52.43304)
#> 4         0 POINT (13.30213 52.41436)
#> 5         1 POINT (13.31386 52.41837)
#> 6         1 POINT (13.31386 52.41837)
#> 7         1 POINT (13.32649 52.42126)
#> 8         1 POINT (13.33818 52.41664)

Created on 2020-10-26 by the reprex package (v0.3.0)

The second of those shows that starting the isochrone 5 minutes later drops the couple that started at 11:04, and replaces with them subsequent services which start at 11:09, yet leaves all the other ones starting at 11:06 intact. The scan horizon is hard-coded here: https://github.com/ATFutures/gtfs-router/blob/7b6de065796930f3c6c36746cabf8500c9025969/src/csa-isochrone.cpp#L127-L129

The value is the actual start time from a given start (from) station plus twice the isochrone duration. That determines when the scan stops, but as described values are only replaced if the overall elapsed_time is shorter, otherwise the first service is recorded. That means that results are effectively independent of the precise value hard-coded in the above line. Shorter horizons can be used, and may offer some speed gains because less of the timetable has to be scanned, but at a risk of potentially missing some connections. I think the chosen value is sensible, and am happy to leave as is.

AlexandraKapp commented 3 years ago

the adaption of the algorithms seems very reasonable. I was wondering if the parameters of the gtfs_isochrone function should be adapted, to better reflect the new calculation? As "end_time" does not really capture the functionality properly anymore.

#' @param start_time Desired departure time at from station
#' @param max_duration maximum travel time duration considered for isochrone
#' @param latest_start_time Latest possible departure at from station. Default: twice the isochrone duration after start_time
gtfs_isochrone <- function (gtfs, start_time, max_duration, latest_start_time = NA) {
    if(is.na(latest_start_time))
        latest_start_time = 2*max_duration + start_time
   ...
}

for backwards compatibility keeping end_time with a deprecation warning might make sense:

gtfs_isochrone <- function (gtfs, start_time, max_duration = NA, latest_start_time = NA, end_time = NA) {
    if(is.na(max_duration)) {
       if(!is.na(end_time)) {
       message("end_time will be deprecated. In future use max_duration instead.")
       max_duration <- end_time - start_time
       }
    }
    if(is.na(latest_start_time))
        latest_start_time = 2*max_duration + start_time
   ...
}
mpadge commented 3 years ago

Yeah, good point. The documentation definitely needs to be updated, and i think these more sophisticated isochrone calculates probably mean that the whole thing should be its own separate vignette. So i'll re-open the issue as a prompt to do that, and only close when the new vignette is there.

mpadge commented 3 years ago

This has all been effectively addressed in the new gtfs_traveltimes() function and algorithms, which will also lead to deprecation of the former gtfs_isochrone() function. New vignette issue is #55, so closing now.