UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
127 stars 16 forks source link

extract paths #12

Closed mpadge closed 6 years ago

mpadge commented 7 years ago

So that output of dodgr can be used directly in osmprob. cc @karpfen

Robinlovelace commented 6 years ago

dodgr_agregate could be a good name for this.

Robinlovelace commented 6 years ago

@richardellison this could be how we do overline quicker.

Robinlovelace commented 6 years ago

Here's a tiny use case that could be useful for testing:

library(dodgr)
library(stplanr)
#> Loading required package: sp
sl <- routes_fast_sf[2:4, ]
rnet1 <- overline(sl = sl, attrib = "length")
rnet2 <- overline(sl = sl, attrib = "length", buff_dist = 1)
par(mfrow = c(1, 3))
plot(rnet1, lwd = rnet1$length/mean(rnet1$length))
plot(rnet2, lwd = rnet2$length/mean(rnet2$length))
sl$highway <- "highway"

mpadge commented 6 years ago

Note that aggregating flows along paths will require re-instating some of the code now deleted as part of #4

Robinlovelace commented 6 years ago

Great stuff. Note that the answer should be pretty objective as it relies on attribute data that already exists per line. I was expecting sf::st_overlaps() or similar to work. Note that @richardellison is also looking this I believe - suggest we go with whatever is faster to emerge and run!

mpadge commented 6 years ago

one thing occurred to me: i bet that trips made closer to the centres of cities are shorter than those made further out (because trip length should correlate with density of everything). This can't be implemented in any current way to aggregate trips, but i could probably easily do it in dodgr so that the distance decay functions depend themselves on distance from centre. That would be a whole min-research project, and likely a good paper, in its own right. I've got all the data needed for a 7-city comparison via the bikedata package. Thoughts @Robinlovelace?

Robinlovelace commented 6 years ago

Yes I think the idea of distance decay curves varying by location has legs (and wheel) and would be a worthy topic of a paper. One thing you should note about the logistic polynomial curve we used on the pct is that it's the % of people who travel (or p of a single person who makes that trip) that can be expected to go by bike. This is different from the distance decay of people making that trip at all.

Very good basis for a paper in any case.

On a different but related note. I keep going on about ecological models providing a solid basis for modelling human behaviour patters - check this out:

Paradis, E., Baillie, S.R., Sutherland, W.J., 2002. Modeling large-scale dispersal distances. Ecological Modelling 151, 279–292.

mpadge commented 6 years ago

One of my fundamental refs: Nekola & McGill Ecography (2014) 37:309-320 "Scale dependency in the functional form of the distance decay relationship"

If you read between the lines, most of it applies pretty directly to human behaviour as well

Robinlovelace commented 6 years ago

The problem is with cycling mode choice it turns out distance is not a very powerful predictor in most settings - people will happily cycle 5km when conditions are right but not 3km when they are shoddy. But that can all by modelled and is what @mem48 and I are looking at in CyIPT>

mpadge commented 6 years ago

This is all now implemented. @Robinlovelace @richardellison check out the new functionality from the example of dodgr_flows():

graph <- weight_streetnet (hampi)
from <- sample (graph$from_id, size = 10)
to <- sample (graph$to_id, size = 5)
to <- to [!to %in% from]
flows <- matrix (10 * runif (length (from) * length (to)), nrow = length (from))                                           
graph <- dodgr_flows (graph, from = from, to = to, flows = flows)

The dodgr_flows() function adds an additional column of "flow" to graph, containing aggregate flows specified by the pairwise flows matrix, and routed along all shortest paths between each pair of points.

There's also a dodgr_paths() function to return explicit shortest paths, if that's useful for you stplanr guys.

TODO

  1. This currently requires routing through a full graph to get explicit flows on every segment. I just need to add an edge map to allow dodgr_flows() to run on a contracted graph, and then map those flows on contracted edges back onto the edges in the original, full graph.
  2. Construct a benchmark against stplanr::sum_network_links() -- dodgr_flows() will be much faster, the question is how much?

Issue to stay open until those two tasks are done

Robinlovelace commented 6 years ago

Sounds good, will test asap.

Robinlovelace commented 6 years ago

Tested this on the Bristol data with @mem48 and think we've the beginnings of a PR to demonstrate it. Struggling to find out how to get a geometry column on the result so it can be plotted on the map - but can see that should be easy to do... What's the logical next step from here?

dodgr_dists(graph = net, )
from <- sample (net$from_id, size = 10)
to <- sample (net$to_id, size = 5)
to <- to [!to %in% from]
flows <- matrix (10 * runif (length (from) * length (to)), nrow = length (from))                                           
graph <- dodgr_flows (net, from = from, to = to, flows = flows)
dodgr_paths(graph = net, from = 1, to = 20)
Robinlovelace commented 6 years ago

We get all zeros in the flow - intended?

summary(graph$flow)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
      0       0       0       0       0       0 
mpadge commented 6 years ago

nah, it shouldn't give zeros, and it works when i test it. The comparisons with stplanr are really interesting. The first naive comparison:

extract a network

Just a random street network of modest size, and a set of 10 random routing points for aggregating flows. The latter are cast to long form for stplanr, but stay in square matrix form for dodgr.

require (magrittr)
bb <- osmdata::getbb ("york uk")
hw <- osmdata::opq (bb) %>%
    osmdata::add_osm_feature (key = "highway") %>%
    osmdata::osmdata_sf (quiet = FALSE) %>%
    extract2 ("osm_lines")

# select random routing points
npts <- 10
xy <- apply (bb, 1, function (i) i [1] + runif (npts) * diff (i))
xy <- data.frame (id = seq (npts),
                  x = xy [, 1],
                  y = xy [, 2])
flow <- matrix (runif (npts * npts) * 10, nrow = npts)
flow_long <- reshape2::melt (flow)
names (flow_long) <- c ("from", "to", "flow")

The stplanr way

Then map those flows the stplanr way:

xy_sp <- sp::SpatialPointsDataFrame (coords = xy,
                                  proj4string = sp::CRS("+init=epsg:4326"), 
                                  data = xy)
xy_sp <- sp::spTransform (xy_sp, attr (hw$geometry, "crs")$proj4string)

# convert to sp:
require (sf)
hw_sp <- as (hw, "Spatial")
ph_net <- stplanr::SpatialLinesNetwork (sl = hw_sp)
# Find the closest node to each station
nodeid <- stplanr::find_network_nodes (ph_net, xy_sp$x, xy_sp$y)
# Convert start and end station IDs in trips table to node IDs in `ph_net`
flow_long$start <- nodeid [match (flow_long$from, xy$id)]
flow_long$end <- nodeid [match (flow_long$to, xy$id)]
flownet <- stplanr::sum_network_links (ph_net, data.frame (flow_long))

This gives something like this:

> summary (flownet$flow)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  16.64   38.04  105.34   78.59  108.02  146.06 

The dodgr way

net <- weight_streetnet (hw, wt_profile = "bicycle")
ids <- dodgr:::match_pts_to_graph (dodgr_vertices (net), xy [, 2:3])
graph <- dodgr_flows (net, from = ids, to = ids, flows = flow)

which gives

> summary (graph$flow)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 0.0000  0.0000  0.0000  0.9489  0.0000 63.2754

First point of observation

The network consists of just under 20,000 lines, yet flows are routed here between only 10 points, so most lines should contain no flow. stplanr::sum_network_links() gives all flows > 0, which does not seem right? dodgr gives a more realistic answer in which nearly all flows are 0, but some are reasonably high.

First benchmark

rbenchmark::benchmark (
        flownet <- stplanr::sum_network_links (ph_net, data.frame (flow_long)),
        graph <- dodgr_flows (net, from = ids, to = ids, flows = flow),
        replications = 2)

This reveals that dodgr is over 6 times slower than stplanr.

Second point of observation

stplanr sums flows along individual network lines, of which there are 18,635 or so. dodgr sums flows along every single edge in the entire graph, of which there are 192,856. So dodgr is doing a lot more work (more than 10 times as much), and produces much more detailed results.

Second benchmark

Now compare relative times including the necessary graph conversion steps

f_stplanr <- function (hw, xy, flow_long)
{
    ph_net <- stplanr::SpatialLinesNetwork (sl = hw)
    nodeid <- stplanr::find_network_nodes (ph_net, xy$x, xy$y)
    flow_long$start <- nodeid [match (flow_long$from, xy$id)]
    flow_long$end <- nodeid [match (flow_long$to, xy$id)]
    flownet <- stplanr::sum_network_links (ph_net, data.frame (flow_long))
}
f_dodgr <- function (hw, xy, flow)
{
    net <- weight_streetnet (hw, wt_profile = "bicycle")
    ids <- dodgr:::match_pts_to_graph (dodgr_vertices (net), xy [, 2:3])
    graph <- dodgr_flows (net, from = ids, to = ids, flows = flow)
}
rbenchmark::benchmark (
                       f_stplanr (hw_sp, xy_sp, flow_long),
                       f_dodgr (hw, xy, flow),
                       replications = 2)

This time dodgr is marginally faster (40% or so), yet it is processing 10 times as much data throughout, and giving results that are 10 times more detailed.

Third point(s) of observation

Maybe dodgr is ultimately not directly compatible with stplanr because the basic form of spatial data are simple edges between two vertices. These are silicate::PRIMITIVE objects, and will be stored as such once silicate hits CRAN. They are arguably not reconcilable with the fundamental representation of sp or sf, but neither I would argue is the task of flow aggregation.

Most ways of representing network flows consider some kind of distance attenuation. (OD matrices neatly avoid this, and ultimately enable stplanr to work with sp/sf data.) There is simply no way to get an sp/sf representation of a flow that attenuates with distance along each and every line. dodgr can do that, and silicate is (will be) the underlying spatial storage form. In the meantime, converting stplanr to sf compatibility has evidently been a massive labour; extending it to this kind of primitive edge model that totally breaks "features" apart into their atomic units is obviously not going to be easy within the stplanr world, and hence my claim that maybe dodgr is ultimately just not compatible with stplanr.

However, note that the sum_network_links results above really do appear to be garbage. There simply can not be flows along every line in the network when only 10 out of nearly 100,000 network nodes are used for routing. Most flows must be zero, and the sum_networks_links result must surely reflect an inappropriate level of spatial aggregation inherent in the underlying sp/sf representation?


cc @mdsumner - Network flows with distance attenuation are a powerful example that sp/sf simply can not represent or process, while silicate can.

richardellison commented 6 years ago

Interesting, makes for a very useful comparison. Just one observation about the issue of sum_network_links results, the results should only return links with flows, so summary() will not (or should not at least) show any observations with a value of zero.

mpadge commented 6 years ago

Yeah, I realised that shortly after I wrote it. That at least makes sense

mpadge commented 6 years ago

All functions now implemented and working, so closing. @Robinlovelace @richardellison just keep this one in your proverbials for future reference, and we can discuss somewhere else down the track whether or not this functionality fits within stplanr. I think at least the distances should be a load faster than your current use of the google API - plus obviously allows massive many-to-many calculations without a google account, which is a big plus.

There's also a dodgr_paths function for many-to-many routing, but that returns complete lists of vertex sequences, rather than sequences of sp/sf objects as you currently do, so also might not be directly compatible. However, the dodgr_contract_graph() function reduces a graph to its minimally distinct components (so includes junctions only), and these can be very readily re-converted to sf linestrings or whatever. This'll happen soon via #7. (The above timing comparisons ought also really be made with a dodgr contracted graph, which would speed things up a lot.)

The rest of it kind of depends on the future of spatial data representations in R, which will very quickly move beyond sp/sf, but ain't quite there just yet ...

Robinlovelace commented 6 years ago

Very interesting stuff - not had a chance to test and understand (and it's too late too now) but will aim to get on it tomorrow after football.