Closed AlexandraKapp closed 4 years ago
That commit implements gtfs_route_headway()
which currently does this:
library (gtfsrouter)
gtfs <- extract_gtfs ("VBB_gtfs.zip")
gtfs <- gtfs_timetable (gtfs, day = "monday")
from <- "Südkreuz"
to <- "Wittenau"
x <- gtfs_route_headway (gtfs, from = from, to = to, quiet = TRUE)
hist (x)
Created on 2020-09-24 by the reprex package (v0.3.0)
It works like this:
start_time = 0
(midnight)departure_time
from nominated stationstart_time
to that departure_time
plus one secondstart_time
reaches midnight.Currently returns a simple vector of (integer
) headways in seconds. It takes a while because it scans the whole timetable for each start time. I don't think much could be gained here by re-writing the underlying C++ code for this specific use case, and even if so, more gains are likely to be possible through simply implementing parallel computation. And so i'll re-open this issue straight way and close it again when parallel computation has been implemented.
Parallel implementation is likely not really worthwhile. This is the difference it makes (with 8 cores):
library (gtfsrouter)
gtfs <- extract_gtfs ("VBB_gtfs.zip")
gtfs <- gtfs_timetable (gtfs, day = "monday")
from <- "Südkreuz"
to <- "Wittenau"
system.time (
x1 <- gtfs_route_headway (gtfs, from = from, to = to, parallel = TRUE)
)
#> Loading required namespace: parallel
#> Loading required namespace: doParallel
#> user system elapsed
#> 71.549 3.332 18.459
system.time (
x2 <- gtfs_route_headway (gtfs, from = from, to = to, parallel = FALSE)
)
#> user system elapsed
#> 42.343 0.237 28.659
identical (x1, x2)
#> [1] TRUE
Created on 2020-09-24 by the reprex package (v0.3.0)
Anybody wanting to resort to parallelization is likely to want to do so because of a desire to calculate a lot of pairwise route headway values, for which cases parallelization would be much better implemented at that higher level. The code used to implement parallelization in that example was this (recorded here for posterity, because it won't be committed):
route_headway_parallel <- function (gtfs, start_stns, end_stns) {
requireNamespace ("parallel")
requireNamespace ("doParallel")
requireNamespace ("foreach")
do1 <- function (gtfs, start_stns, end_stns, start_time, end_time) {
heads <- NULL
while (start_time < end_time) {
gtfs$timetable <- gtfs$timetable [departure_time >= start_time, ]
times <- headway_times (gtfs, start_stns, end_stns, start_time)
heads <- rbind (heads, unname (times))
start_time <- times [1] + 1
}
return (heads)
}
nc <- parallel::detectCores ()
doParallel::registerDoParallel (nc)
f <- rep (seq (nc), each = ceiling (24 / nc))
if (length (f) < 25)
f <- c (f, rep (utils::tail (f, 1), 25 - length (f)))
start_times <- split (0:24, f = f)
# set each start_time to the end_time of the previous group
start_times <- lapply (start_times, function (i) {
return (c (max (c (0, i [1] - 1)),
max (i)))
})
# Then convert to seconds and shift each start time back by 1 second
start_times <- lapply (start_times, function (i) {
i <- i * 3600
i [1] <- max (0, i [1] - 1)
return (i)
})
dat <- lapply (seq_along (start_times), function (i)
list (gtfs = gtfs,
start_stns = start_stns,
end_stns = end_stns,
start_time = start_times [[i]] [1],
end_time = start_times [[i]] [2]))
heads <- foreach::foreach (dat = dat) %dopar% {
do1 (dat$gtfs, dat$start_stns, dat$end_stns, dat$start_time, dat$end_time)
}
heads <- do.call (rbind, heads)
return (heads [which (!duplicated (heads)), ])
}
So i think the code in current state is a sufficient first cut, and shall close again. Let me know what you think.
thanks for the super quick response! I'll check it out 👌
How often does a connection go from Place A to Place B?
@mpadge as discussed, I created this as an issue