Closed mpadge closed 6 years ago
dodgr_agregate
could be a good name for this.
@richardellison this could be how we do overline quicker.
Here's a tiny use case that could be useful for testing:
library(dodgr)
library(stplanr)
#> Loading required package: sp
sl <- routes_fast_sf[2:4, ]
rnet1 <- overline(sl = sl, attrib = "length")
rnet2 <- overline(sl = sl, attrib = "length", buff_dist = 1)
par(mfrow = c(1, 3))
plot(rnet1, lwd = rnet1$length/mean(rnet1$length))
plot(rnet2, lwd = rnet2$length/mean(rnet2$length))
sl$highway <- "highway"
Note that aggregating flows along paths will require re-instating some of the code now deleted as part of #4
Great stuff. Note that the answer should be pretty objective as it relies on attribute data that already exists per line. I was expecting sf::st_overlaps()
or similar to work. Note that @richardellison is also looking this I believe - suggest we go with whatever is faster to emerge and run!
one thing occurred to me: i bet that trips made closer to the centres of cities are shorter than those made further out (because trip length should correlate with density of everything). This can't be implemented in any current way to aggregate trips, but i could probably easily do it in dodgr
so that the distance decay functions depend themselves on distance from centre. That would be a whole min-research project, and likely a good paper, in its own right. I've got all the data needed for a 7-city comparison via the bikedata
package. Thoughts @Robinlovelace?
Yes I think the idea of distance decay curves varying by location has legs (and wheel) and would be a worthy topic of a paper. One thing you should note about the logistic polynomial curve we used on the pct is that it's the % of people who travel (or p of a single person who makes that trip) that can be expected to go by bike. This is different from the distance decay of people making that trip at all.
Very good basis for a paper in any case.
On a different but related note. I keep going on about ecological models providing a solid basis for modelling human behaviour patters - check this out:
Paradis, E., Baillie, S.R., Sutherland, W.J., 2002. Modeling large-scale dispersal distances. Ecological Modelling 151, 279–292.
One of my fundamental refs: Nekola & McGill Ecography (2014) 37:309-320 "Scale dependency in the functional form of the distance decay relationship"
If you read between the lines, most of it applies pretty directly to human behaviour as well
The problem is with cycling mode choice it turns out distance is not a very powerful predictor in most settings - people will happily cycle 5km when conditions are right but not 3km when they are shoddy. But that can all by modelled and is what @mem48 and I are looking at in CyIPT>
This is all now implemented. @Robinlovelace @richardellison check out the new functionality from the example of dodgr_flows()
:
graph <- weight_streetnet (hampi)
from <- sample (graph$from_id, size = 10)
to <- sample (graph$to_id, size = 5)
to <- to [!to %in% from]
flows <- matrix (10 * runif (length (from) * length (to)), nrow = length (from))
graph <- dodgr_flows (graph, from = from, to = to, flows = flows)
The dodgr_flows()
function adds an additional column of "flow"
to graph
, containing aggregate flows specified by the pairwise flows
matrix, and routed along all shortest paths between each pair of points.
There's also a dodgr_paths()
function to return explicit shortest paths, if that's useful for you stplanr
guys.
dodgr_flows()
to run on a contracted graph, and then map those flows on contracted edges back onto the edges in the original, full graph.stplanr::sum_network_links()
-- dodgr_flows()
will be much faster, the question is how much?Issue to stay open until those two tasks are done
Sounds good, will test asap.
Tested this on the Bristol data with @mem48 and think we've the beginnings of a PR to demonstrate it. Struggling to find out how to get a geometry
column on the result so it can be plotted on the map - but can see that should be easy to do... What's the logical next step from here?
dodgr_dists(graph = net, )
from <- sample (net$from_id, size = 10)
to <- sample (net$to_id, size = 5)
to <- to [!to %in% from]
flows <- matrix (10 * runif (length (from) * length (to)), nrow = length (from))
graph <- dodgr_flows (net, from = from, to = to, flows = flows)
dodgr_paths(graph = net, from = 1, to = 20)
We get all zeros in the flow - intended?
summary(graph$flow)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0 0 0 0 0 0
nah, it shouldn't give zeros, and it works when i test it. The comparisons with stplanr
are really interesting. The first naive comparison:
Just a random street network of modest size, and a set of 10 random routing points for aggregating flows. The latter are cast to long form for stplanr
, but stay in square matrix form for dodgr
.
require (magrittr)
bb <- osmdata::getbb ("york uk")
hw <- osmdata::opq (bb) %>%
osmdata::add_osm_feature (key = "highway") %>%
osmdata::osmdata_sf (quiet = FALSE) %>%
extract2 ("osm_lines")
# select random routing points
npts <- 10
xy <- apply (bb, 1, function (i) i [1] + runif (npts) * diff (i))
xy <- data.frame (id = seq (npts),
x = xy [, 1],
y = xy [, 2])
flow <- matrix (runif (npts * npts) * 10, nrow = npts)
flow_long <- reshape2::melt (flow)
names (flow_long) <- c ("from", "to", "flow")
stplanr
wayThen map those flows the stplanr
way:
xy_sp <- sp::SpatialPointsDataFrame (coords = xy,
proj4string = sp::CRS("+init=epsg:4326"),
data = xy)
xy_sp <- sp::spTransform (xy_sp, attr (hw$geometry, "crs")$proj4string)
# convert to sp:
require (sf)
hw_sp <- as (hw, "Spatial")
ph_net <- stplanr::SpatialLinesNetwork (sl = hw_sp)
# Find the closest node to each station
nodeid <- stplanr::find_network_nodes (ph_net, xy_sp$x, xy_sp$y)
# Convert start and end station IDs in trips table to node IDs in `ph_net`
flow_long$start <- nodeid [match (flow_long$from, xy$id)]
flow_long$end <- nodeid [match (flow_long$to, xy$id)]
flownet <- stplanr::sum_network_links (ph_net, data.frame (flow_long))
This gives something like this:
> summary (flownet$flow)
Min. 1st Qu. Median Mean 3rd Qu. Max.
16.64 38.04 105.34 78.59 108.02 146.06
dodgr
waynet <- weight_streetnet (hw, wt_profile = "bicycle")
ids <- dodgr:::match_pts_to_graph (dodgr_vertices (net), xy [, 2:3])
graph <- dodgr_flows (net, from = ids, to = ids, flows = flow)
which gives
> summary (graph$flow)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.0000 0.0000 0.0000 0.9489 0.0000 63.2754
The network consists of just under 20,000 lines, yet flows are routed here between only 10 points, so most lines should contain no flow. stplanr::sum_network_links()
gives all flows > 0, which does not seem right? dodgr
gives a more realistic answer in which nearly all flows are 0, but some are reasonably high.
rbenchmark::benchmark (
flownet <- stplanr::sum_network_links (ph_net, data.frame (flow_long)),
graph <- dodgr_flows (net, from = ids, to = ids, flows = flow),
replications = 2)
This reveals that dodgr
is over 6 times slower than stplanr
.
stplanr
sums flows along individual network lines, of which there are 18,635 or so. dodgr
sums flows along every single edge in the entire graph, of which there are 192,856. So dodgr
is doing a lot more work (more than 10 times as much), and produces much more detailed results.
Now compare relative times including the necessary graph conversion steps
f_stplanr <- function (hw, xy, flow_long)
{
ph_net <- stplanr::SpatialLinesNetwork (sl = hw)
nodeid <- stplanr::find_network_nodes (ph_net, xy$x, xy$y)
flow_long$start <- nodeid [match (flow_long$from, xy$id)]
flow_long$end <- nodeid [match (flow_long$to, xy$id)]
flownet <- stplanr::sum_network_links (ph_net, data.frame (flow_long))
}
f_dodgr <- function (hw, xy, flow)
{
net <- weight_streetnet (hw, wt_profile = "bicycle")
ids <- dodgr:::match_pts_to_graph (dodgr_vertices (net), xy [, 2:3])
graph <- dodgr_flows (net, from = ids, to = ids, flows = flow)
}
rbenchmark::benchmark (
f_stplanr (hw_sp, xy_sp, flow_long),
f_dodgr (hw, xy, flow),
replications = 2)
This time dodgr
is marginally faster (40% or so), yet it is processing 10 times as much data throughout, and giving results that are 10 times more detailed.
Maybe dodgr
is ultimately not directly compatible with stplanr
because the basic form of spatial data are simple edges between two vertices. These are silicate::PRIMITIVE
objects, and will be stored as such once silicate
hits CRAN. They are arguably not reconcilable with the fundamental representation of sp
or sf
, but neither I would argue is the task of flow aggregation.
Most ways of representing network flows consider some kind of distance attenuation. (OD matrices neatly avoid this, and ultimately enable stplanr
to work with sp/sf
data.) There is simply no way to get an sp/sf
representation of a flow that attenuates with distance along each and every line. dodgr
can do that, and silicate
is (will be) the underlying spatial storage form. In the meantime, converting stplanr
to sf
compatibility has evidently been a massive labour; extending it to this kind of primitive edge model that totally breaks "features" apart into their atomic units is obviously not going to be easy within the stplanr
world, and hence my claim that maybe dodgr
is ultimately just not compatible with stplanr
.
However, note that the sum_network_links
results above really do appear to be garbage. There simply can not be flows along every line in the network when only 10 out of nearly 100,000 network nodes are used for routing. Most flows must be zero, and the sum_networks_links
result must surely reflect an inappropriate level of spatial aggregation inherent in the underlying sp/sf
representation?
cc @mdsumner - Network flows with distance attenuation are a powerful example that sp/sf
simply can not represent or process, while silicate
can.
Interesting, makes for a very useful comparison. Just one observation about the issue of sum_network_links results, the results should only return links with flows, so summary() will not (or should not at least) show any observations with a value of zero.
Yeah, I realised that shortly after I wrote it. That at least makes sense
All functions now implemented and working, so closing. @Robinlovelace @richardellison just keep this one in your proverbials for future reference, and we can discuss somewhere else down the track whether or not this functionality fits within stplanr
. I think at least the distances should be a load faster than your current use of the google API - plus obviously allows massive many-to-many calculations without a google account, which is a big plus.
There's also a dodgr_paths
function for many-to-many routing, but that returns complete lists of vertex sequences, rather than sequences of sp/sf
objects as you currently do, so also might not be directly compatible. However, the dodgr_contract_graph()
function reduces a graph to its minimally distinct components (so includes junctions only), and these can be very readily re-converted to sf
linestrings or whatever. This'll happen soon via #7. (The above timing comparisons ought also really be made with a dodgr
contracted graph, which would speed things up a lot.)
The rest of it kind of depends on the future of spatial data representations in R, which will very quickly move beyond sp/sf
, but ain't quite there just yet ...
Very interesting stuff - not had a chance to test and understand (and it's too late too now) but will aim to get on it tomorrow after football.
So that output of
dodgr
can be used directly inosmprob
. cc @karpfen