proportional distances along defined kinds of ways

mpadge commented 3 years ago

Develop a new function modified from dodgr_distances to accept one additional parameter specifying a binary "flag" variable. Use this to aggregate two distances, rather than just one in current dodgr_distances, with the second variable only aggregated along edges with the flag variable set to TRUE.

Use cases:

Generate a sample of journeys between random points in a given area/city, and calculate things like:

Average proportional distance that pedestrians are forced to walk along major motorways
Average distance of journeys along streets named after men compared with streets named after women.

The latter would provide more insight into equalstreetnames (with more cities via github repo linked at top of that page). While those websites are really cool, it's also very obvious that streets named after women tend to be both peripheral and minor. To work out the exposure of actual people traversing a city to female versus male streets requires a realistic routing of random (or all) journeys through a city. That in turn could be achieved with this functionality.

ping @Robinlovelace @mem48 Any thoughts on the usefulness of this functionality for your work? Should be pretty easy to implement, if you think it would help.

Robinlovelace commented 3 years ago

I think this would be useful for sure, great idea.

mpadge commented 3 years ago

Another great use case: Take OD data and say you want to examine bicycle flows between all pairs. For each pair, this function would enable you to calculate the average length between all points in O and all in D that actually travels along dedicated bicycle paths (or whatever). So your OD flows would be able to be directly related to infrastructure connecting each pair.

mpadge commented 3 years ago

Steps to implement this:

Make a new DGraphDualEdge class which inherits from current DGraphEdge, but has an additional <bool> flag vector https://github.com/ATFutures/dodgr/blob/f59b049b23f0919f88ad0a466057dee991db4cbc/src/dgraph.h#L23-L28
Adapt DGraph::addNewEdge to addNewDualEdge which also includes the flag vector
Duplicate PathFinder::scan_edges as scan_dual_edges, and include vectors of <bool> flag and <double> d_flagged.
Duplicate and modify all relevant bits in run_sp, including OneDist -> OneDistDual, and rcpp_get_sp_dists as rcpp_get_sp_dists_dual.
Export rcpp_get_sp_dists_dual to R and write a wrapper for that function interface.

mem48 commented 3 years ago

Definitely sounds useful, only thought is what if you wanted more than a binary breakdown, for example, % of the journey on a 20 mph road, 30 mph road, 40 mph road, etc

mpadge commented 3 years ago

Yeah, @mem48, I had a think about that too - being able to submit a single factor column, and getting a breakdown for all specified factors. But the mechanics have to be propagated to the innermost depths of the underlying C++ Dijkstra algorithms, so even just doing it for binary is pretty non-trivial. Recorded here for posterity nonetheless, so thanks!

mpadge commented 3 years ago

~~This is now almost fully implemented in a branch (#166). That will be merged when the TODO list has been completed~~ - Done! Leaving vignette ~~should be about~~ as the only major remaining task.

Edit:

TODO

[ ] vignette
[x] Option to return aggregate measure of relative proportions only, in order to allow really large jobs without having to hold all matrices in memory - each map-reduce job can then reduce down to a simple vector of category proportions prior to return.
[x] Add extra parameter of distance threshold to override to and directly calculate proportions from each from point along all paths out to specified threshold
[x] Refine the proportions_only argument to return individual proportions for each from value in that case.

mpadge commented 2 years ago

@mem48 @Robinlovelace This function now works. It's called dodgr_dists_categorical, with a few types of interface implemented. The input graphs need to have an edge_type column with some number of discrete values (integers, characters, whatever). Distances are then aggregated along each of these categories, and final results given for each component. There's also a dlimit parameter which aggregates distances along all categories from a defined set of "from" points. The following code demonstrates that:

library (dodgr)
packageVersion ("dodgr")
#> [1] '0.2.10.11'
f <- "/<path>/<to>/leeds-sc.Rds"
graph <- readRDS (f) %>%
    weight_streetnet (wt_profile = "bicycle") %>%
    dodgr_contract_graph ()
#> Loading required namespace: geodist
#> Loading required namespace: dplyr

# Some processing of "highways" into a restricted set of `edge_type` categories:
graph$edge_type <- graph$highway
graph$edge_type [grep ("^(primary|trunk)", graph$edge_type)] <- "primary"
graph$edge_type [grep ("^secondary", graph$edge_type)] <- "secondary"
graph$edge_type [grep ("^tertiary", graph$edge_type)] <- "tertiary"
graph$edge_type [graph$edge_type == "path"] <- "pedestrian"
graph$edge_type [graph$edge_type == "living_street"] <- "residential"
not_these_types <- c ("service", "footway", "unclassified", "track",
                      "steps", "bridleway")
graph$edge_type [graph$edge_type %in% not_these_types] <- NA_character_
sort (table (graph$edge_type), decreasing = TRUE)
#> 
#> residential    tertiary  pedestrian     primary    cycleway   secondary 
#>      113813       23543       18702       16689        9979        5231

v <- dodgr_vertices (graph)
from <- v$id

system.time (
    d <- dodgr_dists_categorical (graph, from = from, dlimit = 2000)
)
#>     user   system  elapsed 
#> 2106.168    3.884  341.442
for (n in names (d) [-1])
    d [[n]] <- d [[n]] / d$distance
d$x <- v$x [match (rownames (d), v$id)]
d$y <- v$y [match (rownames (d), v$id)]

^{Created on 2021-10-04 by the reprex package (v2.0.1.9000)}

All of Leeds takes just over 5 minutes, and the results look like this:

Proportion of all bicycle trips along dedicated cycleways

Proportion of all bicycle trips forced to traverse primary and secondary roads

Those maps provide an immediate overview of locations of relatively good and bad cycling infrastructure. Each point represents the proportion of bicycle-routed trips along all possible paths outward from that point that traverse the given kinds of ways out to an arbitrary threshold of 2km. Increasing that to a larger distance (like, say 5km) generates smoother results, but simply takes longer to calculate.

Robinlovelace commented 2 years ago

Hi Mark, first impressions from that and the maps covering a huge area: seriously impressive stuff!

mpadge commented 2 years ago

That's partly why this has taken so long - i knew if was possible to do this extremely efficiently, but required completely designing the path finding algorithms at the lowest level. I only recently found time to do that, and am very happy with the result. Every path from every OSM street junction in Leeds in 5 minutes.

Robinlovelace commented 2 years ago

Follow-up question on reproducibility, what was the command used to generate leeds-sc.Rds? Will give it a spin in a free moment...

Robinlovelace commented 2 years ago

Full reprex for my hometown of Hereford:

remotes::install_github("atfutures/dodgr")
library (dodgr)
packageVersion ("dodgr")
sc = dodgr::dodgr_streetnet("hereford UK")
graph = sc %>% 
  weight_streetnet (wt_profile = "bicycle") %>%
  dodgr_contract_graph ()
graph$edge_type <- graph$highway
graph$edge_type [grep ("^(primary|trunk)", graph$edge_type)] <- "primary"
graph$edge_type [grep ("^secondary", graph$edge_type)] <- "secondary"
graph$edge_type [grep ("^tertiary", graph$edge_type)] <- "tertiary"
graph$edge_type [graph$edge_type == "path"] <- "pedestrian"
graph$edge_type [graph$edge_type == "living_street"] <- "residential"
not_these_types <- c ("service", "footway", "unclassified", "track",
                      "steps", "bridleway")
graph$edge_type [graph$edge_type %in% not_these_types] <- NA_character_
sort (table (graph$edge_type), decreasing = TRUE)
#> 
#> residential    tertiary  pedestrian     primary    cycleway   secondary 
#>      113813       23543       18702       16689        9979        5231

v <- dodgr_vertices (graph)
from <- v$id

system.time (
  d <- dodgr_dists_categorical (graph, from = from, dlimit = 2000)
)
#>     user   system  elapsed 
#> 2106.168    3.884  341.442
for (n in names (d) [-1])
  d [[n]] <- d [[n]] / d$distance
d$x <- v$x [match (rownames (d), v$id)]
d$y <- v$y [match (rownames (d), v$id)]
head(d)
dsf = dodgr_to_sf(d)
dsf2 = sf::st_as_sf(d, coords = c("x", "y"), crs = 4326)

mpadge commented 2 years ago

Sweet! If you know precisely what you want, it'll generally be quicker just to keep that edge type (say, "cycleway"), and set all others to NA, which are then ignored and not aggregated. All will be documented asap in vignette, which will close this issue.

mem48 commented 2 years ago

As Robin said, this looks great. Can't wait to try it out.

mpadge commented 2 years ago

Thanks @mem48, it was essentially all motivated by your incidental comment:

only thought is what if you wanted more than a binary breakdown, for example, % of the journey on a 20 mph road, 30 mph road, 40 mph road, etc

That got me thinking ... for a long time ... that such an approach must indeed be possible. Turns out that it is, so thanks!

mem48 commented 2 years ago

I'll have to be careful what I say in the future, I might accidentally send you off on another long project.

mpadge commented 2 years ago

New vignette for you reading pleasure gentlemen! :book: :eyes: Thanks both for the input :+1:

UrbanAnalyst / dodgr