UrbanAnalyst / gtfsrouter

Routing and analysis engine for GTFS (General Transit Feed Specification) data
https://urbananalyst.github.io/gtfsrouter/
81 stars 17 forks source link

Headway of trips (including transfers) #43

Closed AlexandraKapp closed 3 years ago

AlexandraKapp commented 3 years ago

How often does a connection go from Place A to Place B?

@mpadge as discussed, I created this as an issue

mpadge commented 3 years ago

That commit implements gtfs_route_headway() which currently does this: library (gtfsrouter) gtfs <- extract_gtfs ("VBB_gtfs.zip") gtfs <- gtfs_timetable (gtfs, day = "monday") from <- "Südkreuz" to <- "Wittenau" x <- gtfs_route_headway (gtfs, from = from, to = to, quiet = TRUE) hist (x)

Created on 2020-09-24 by the reprex package (v0.3.0)

It works like this:

  1. Set start_time = 0 (midnight)
  2. Find first connection and store departure_time from nominated station
  3. Set start_time to that departure_time plus one second
  4. Repeat steps 2--3 until start_time reaches midnight.

Currently returns a simple vector of (integer) headways in seconds. It takes a while because it scans the whole timetable for each start time. I don't think much could be gained here by re-writing the underlying C++ code for this specific use case, and even if so, more gains are likely to be possible through simply implementing parallel computation. And so i'll re-open this issue straight way and close it again when parallel computation has been implemented.

mpadge commented 3 years ago

Parallel implementation is likely not really worthwhile. This is the difference it makes (with 8 cores):

library (gtfsrouter)
gtfs <- extract_gtfs ("VBB_gtfs.zip")
gtfs <- gtfs_timetable (gtfs, day = "monday")
from <- "Südkreuz"
to <- "Wittenau"
system.time (
    x1 <- gtfs_route_headway (gtfs, from = from, to = to, parallel = TRUE)
)
#> Loading required namespace: parallel
#> Loading required namespace: doParallel
#>    user  system elapsed 
#>  71.549   3.332  18.459
system.time (
    x2 <- gtfs_route_headway (gtfs, from = from, to = to, parallel = FALSE)
)
#>    user  system elapsed 
#>  42.343   0.237  28.659
identical (x1, x2)
#> [1] TRUE

Created on 2020-09-24 by the reprex package (v0.3.0)

Anybody wanting to resort to parallelization is likely to want to do so because of a desire to calculate a lot of pairwise route headway values, for which cases parallelization would be much better implemented at that higher level. The code used to implement parallelization in that example was this (recorded here for posterity, because it won't be committed):

route_headway_parallel <- function (gtfs, start_stns, end_stns) {

    requireNamespace ("parallel")
    requireNamespace ("doParallel")
    requireNamespace ("foreach")

    do1 <- function (gtfs, start_stns, end_stns, start_time, end_time) {
        heads <- NULL
        while (start_time < end_time) {
            gtfs$timetable <- gtfs$timetable [departure_time >= start_time, ]
            times <- headway_times (gtfs, start_stns, end_stns, start_time)
            heads <- rbind (heads, unname (times))
            start_time <- times [1] + 1
        }
        return (heads)
    }

    nc <- parallel::detectCores ()
    doParallel::registerDoParallel (nc)

    f <- rep (seq (nc), each = ceiling (24 / nc))
    if (length (f) < 25)
        f <- c (f, rep (utils::tail (f, 1), 25 - length (f)))

    start_times <- split (0:24, f = f)
    # set each start_time to the end_time of the previous group
    start_times <- lapply (start_times, function (i) {
                               return (c (max (c (0, i [1] - 1)),
                                          max (i)))
                       })
    # Then convert to seconds and shift each start time back by 1 second
    start_times <- lapply (start_times, function (i) {
                               i <- i * 3600
                               i [1] <- max (0, i [1] - 1)
                               return (i)
                       })

    dat <- lapply (seq_along (start_times), function (i)
                       list (gtfs = gtfs,
                             start_stns = start_stns,
                             end_stns = end_stns,
                             start_time = start_times [[i]] [1],
                             end_time = start_times [[i]] [2]))

    heads <- foreach::foreach (dat = dat) %dopar% {
        do1 (dat$gtfs, dat$start_stns, dat$end_stns, dat$start_time, dat$end_time)
    }

    heads <- do.call (rbind, heads)
    return (heads [which (!duplicated (heads)), ])
}

So i think the code in current state is a sufficient first cut, and shall close again. Let me know what you think.

AlexandraKapp commented 3 years ago

thanks for the super quick response! I'll check it out 👌