UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
80 stars 17 forks source link

larger isochrone time window returns fewer mid and end_points #56

Closed AlexandraKapp closed 3 years ago

AlexandraKapp commented 3 years ago

library(gtfsrouter)
packageVersion("gtfsrouter")
#> [1] '0.0.4.100'

gtfs <- extract_gtfs(file.path("gtfs.zip"))
#> > Unzipping GTFS archivev Unzipped GTFS archive  
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
ttable <- gtfs_timetable(gtfs, day = "tuesday")

iso1 <- gtfs_isochrone(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 10*3600)
#> Loading required namespace: geodist
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
iso2 <- gtfs_isochrone(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 11*3600)

nrow(iso1$mid_points)
#> [1] 2435
nrow(iso2$mid_points)
#> [1] 1861
nrow(iso1$end_points)
#> [1] 124
nrow(iso2$end_points)
#> [1] 87

Created on 2020-11-24 by the reprex package (v0.3.0)

AlexandraKapp commented 3 years ago

stops are also generally a lot less (~10%) than with the current CRAN version:

library(gtfsrouter)
#> Warning: package 'gtfsrouter' was built under R version 4.0.3
packageVersion("gtfsrouter")
#> [1] '0.0.4'

gtfs <- extract_gtfs(file.path("~/03_GitHub/NetworkAnalysis/inst/data_/VBB/gtfs.zip"))
#> > Unzipping GTFS archivev Unzipped GTFS archive  
#> > Extracting GTFS feedv Extracted GTFS feed 
#> > Converting stop times to secondsv Converted stop times to seconds 
#> > Converting transfer times to secondsv Converted transfer times to seconds
ttable <- gtfs_timetable(gtfs, day = "tuesday")

iso1 <- gtfs_isochrone(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 10*3600)
#> Loading required namespace: geodist
#> Loading required namespace: lwgeom
#> Registered S3 method overwritten by 'spatstat':
#>   method     from
#>   print.boxx cli 
#> Linking to GEOS 3.8.0, GDAL 3.0.4, PROJ 6.3.1
iso2 <- gtfs_isochrone(ttable, "Berlin Hauptbahnhof", start_time = 8 * 3600, end_time = 11*3600)
#> Maximum distance is > 100km. The 'cheap' measure is inaccurate over such
#> large distances, you'd likely be better using a different 'measure'.

nrow(iso1$mid_points)
#> [1] 24918
nrow(iso2$mid_points)
#> [1] 36750
nrow(iso1$end_points)
#> [1] 1332
nrow(iso2$end_points)
#> [1] 1713

Created on 2020-11-24 by the reprex package (v0.3.0)

mpadge commented 3 years ago

Those are simply due to boundary effects as the isochrone times extend out towards the limit of the entire system.

library (gtfsrouter)
gtfs <- extract_gtfs ("vbb.zip")
#> ▶ Unzipping GTFS archive
#> ✔ Unzipped GTFS archive
#> ▶ Extracting GTFS feed✔ Extracted GTFS feed 
#> ▶ Converting stop times to seconds✔ Converted stop times to seconds 
#> ▶ Converting transfer times to seconds✔ Converted transfer times to seconds
gtfs <- gtfs_timetable (gtfs, day = "tuesday")

from <- "Berlin Hauptbahnhof"
start_time <- 8 * 3600

# then examine numbers of mid-points as a fn of isochrone duration,
# using two sequences of durations just to compare:
e <- c (2000, 3000, 5000, 7500, 10000, 10800, 12000)
n <- length (e)
e <- c (e, floor (3600 * 10 ^ (1:10 / 20)))
index <- rep (2, length (e))
index [seq (n)] <- 1 # index is then into the two groups of end times
n <- vapply (e, function (i) {
                 g <- gtfs_isochrone (gtfs, from, start_time, end_time = start_time + i)
                 nrow (g$mid_points)    }, integer (1))

dat <- data.frame (time = e,
                   n = n,
                   group = factor (index))
library (ggplot2)
ggplot (dat, aes (x = time, y = n, colour = group)) +
    geom_line () +
    geom_point ()

Created on 2020-11-24 by the reprex package (v0.3.0)

A duration of 3 hours is 10,800 seconds, and the red line suggests that is the point beyond which boundary effects become really pronounced. The blue line shows that it's even possible to get "unexpected" variations well below these. These all arise because any time trips extend towards a given spot on the boundary they "drag" with them a potential multiplicity of end points which ultimately converge on a single point just inside the boundary. And fewer end points mean fewer mid points. Any isochrone duration which causes previously multiple end points to collapse onto a single boundary end point will result in reduced numbers of mid-points, with exact patterns dependent on the intricacies of any given system.

These effects are of course not "real" in any sense (except perhaps on islands), but it's probably useful to examine them to get an idea of how far out you can safely extend isochrone durations before such boundary effects come into play. Does that make sense?

AlexandraKapp commented 3 years ago

I'm not sure understand correctly: shouldn't the isochrone cover all stops that are within reach within the given time?

When multiple points converge to a single point, I would have assumed that from then on only the "remaining" route is being continued, but the other endpoints are being kept as routes up to the stop before the converging endpoints?

Just to make sure I get it right, given this example of a blue and green route:

grafik

With only a shorter time window, I get this result:

grafik

But when I extend the time window, endpoint 1 and 2 converge and I get this:

grafik

Is that what you meant? If so, then I guess I would rather have expected this outcome:

grafik

mpadge commented 3 years ago

Yes, i think those diagrams encapsulate what i was trying to express. And the point i was trying to make was definitely the transition from your 2nd to 3rd diagrams - the other mid-points simply disappear to be replaced by the single best connection to the now-common "Endpoint1". Your fourth and final diagram is a possibility, but it depends on how the final node ends up getting connected. If "Endpoint 1" has been flagged, and a connection moves up the right branch and identifies "Endpoint 2" as a candidate end point, then things can simply stay that way, and, as you say in the diagram. "Route 1" will not be continued, and you'll end up with 2 endpoints. So that is a possibility.

If, however, Endpoint 1 has been flagged, but a subsequent connection is able to extend beyond it, then you can no longer consider that to be an end point, and so you have to remove it and update to the subsequent station, Endpoint 2. It would be possible to modify the algorithm so that any time a station flagged as potential end point can extend but in doing so would connect to another station already labelled as an end point, then that original flag should be kept. That would create what you depict in the 4th diagram, however, it would also go against the notion of an isochrone, I would suggest. There might simply be a single slow service on the left branch that can only reach Endpoint1 within the isochrone duration, while there are countless many other connections which extend well beyond it. Keeping the situation depicted in your 4th diagram would require labelling as an end point any station that any single time served as a potential end point. That in itself would open the possibility of almost all stations being end points, because the algorithm also traces any and all kinds of circuitous routes. You need an ability to declare those circuitous routes invalid, which is implemented by saying the only valid connection to any station at any time is the fastest. And that in turn means that extending Endpoint1 to Endpoint2 has to involve removing Endpoint1 as an end point. (Along with effectively "forgetting" about whichever of those two branches was the slowest, as an isochrone only utlimately traces the single fastest paths to each endpoint.)

... happy to iterate thoughts further ... :smile:

AlexandraKapp commented 3 years ago

alright, glad we are on the same page :)

In my view, for a correct result the routes that are not continued should definitely be somehow included. Otherwise I would assume, that those stops cannot be reached within the given time window. For a street network it is probably enough to know the furthest points you can get to, to create an isochrone. But for public transport networks I cannot assume that. Because - if you can reach a place by car within x minutes, you can safely assue that a place which is 1km closer can be reached faster. For public transport this is not true - as there might not be a line running.

How about labeling Endpoint 1 "Midpoint"? Is that easier? Then there would be midpoints without endpoints though.

mpadge commented 3 years ago

Link to discussion in #57 for resolution of this point. I think the discussion here also indicates the current approach of providing a single list of "midpoint" stations is non-ideal, and we should reformat the output to be more useful and more insightful into the actual services leading to the endpoints of an isochrone.

mpadge commented 3 years ago

Closed to move discussion over to #58