UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
127 stars 16 forks source link

network centrality function #90

Open mpadge opened 5 years ago

mpadge commented 5 years ago

After #69 ad #89, dodgr will be able to calculate centrality more efficiently than igraph, and it will be straightforward to adapt existing code to do so.

mpadge commented 5 years ago

Centrality can already be calculated with dodgr_flows and a unit flow matrix, but this is not the same as igraph::centrality, because the latter simply uses the graph with unit edge weights and unit flow matrix, whereas the dodgr-equivalent version uses the full weighted graph. The igraph version can be calculated by setting all edge distances to 1, but dodgr won't calculate that as efficiently as igraph because dodgr remains hard coded for <double> edge weights (see #89). Closing this for now and leaving the 2 options as:

  1. Faster, cheaper calculation via igraph to give centrality independent of everything but graph connections; or
  2. Calculation "as is" via dodgr that takes longer than igraph, but will give a much more accurate measure that reflects the fullness of the underlying graph structure including edge properties.
mpadge commented 4 years ago

Re-opening because this now seems really worth doing. I just have to code a custom Dijkstra to (i) use (rounded) <int> edge weights; and (ii) aggregate to unit-integer values. Can then calculate centrality via arbitrary metrics: distance, time, and even angle. Ping @Robinlovelace @agila5

mpadge commented 4 years ago

Note this algorithm (original paper here - "A faster algorithm for betweenness centrality" (2001)). This is what igraph uses, and fairly straightfoward to implement.

mpadge commented 4 years ago

All of the C++ work done in this commit. Now just have to wrap in an R function, and test.

Robinlovelace commented 4 years ago

Awesome!

mpadge commented 4 years ago

@Robinlovelace That gets us to this:

library(dodgr)
graph <- weight_streetnet (hampi, wt_profile = "foot")
igr <- dodgr_to_igraph (graph)
graph_c <- dodgr_contract_graph (graph)
igr_c <- dodgr_to_igraph (graph_c)

# edge betweenness on full graph
f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE, edges = TRUE)
f_igraph <- function (graph) igraph::edge_betweenness (graph)
rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_i <- f_igraph (igr),
             replications = 2
) [, 1:4]
#>                      test replications elapsed relative
#> 1 res_d <- f_dodgr(graph)            2   1.234    1.213
#> 2  res_i <- f_igraph(igr)            2   1.017    1.000
identical (res_i, res_d$centrality)
#> [1] TRUE

# edge betweenness on contracted graph
rbenchmark::benchmark (
             res_d <- f_dodgr (graph_c),
             res_i <- f_igraph (igr_c),
             replications = 2
) [, 1:4]
#>                        test replications elapsed relative
#> 1 res_d <- f_dodgr(graph_c)            2   0.022    1.222
#> 2  res_i <- f_igraph(igr_c)            2   0.018    1.000
identical (res_i, res_d$centrality)
#> [1] TRUE

Created on 2019-10-16 by the reprex package (v0.3.0)

dodgr is only about 20% slower, essentially because it's C++ while igraph is pure C. But the point of this is now the trivial extra step of wrapping it in RcppParalllel, so that dodgr will effectively be several times faster through parallelisation... Re-opening until that's been done.

Robinlovelace commented 4 years ago

Legend. Looks very cool. I look forward to results and giving this a spin. Heads-up @agila5, if this can work with different weighting profiles that respect oneway and other restrictions this will be a big step forward.

mpadge commented 4 years ago

It already does that, and the idea is to enable flexible specification of different ways of measuring centrality that can't be done via igraph, notably including for example centrality considering effects of incline, or using time-based routing including effects of turning across oncoming traffic.

Robinlovelace commented 4 years ago

the idea is to enable flexible specification of different ways of measuring centrality

The ideal contents of WHO-funded research into scenarios of intervention in the transport system.

mpadge commented 4 years ago
setwd ("/data/mega/code/repos/atfutures/dodgr")
devtools::load_all (".", export_all = FALSE)
#> Loading dodgr
graph <- dodgr_streetnet ("ilkley england") %>%
    weight_streetnet (wt_profile = "bicycle") %>%
    dodgr_contract_graph ()
#> The following highway types are present in data yet lack corresponding weight_profile values: road,
message ("graph has ", format (nrow (graph), big.mark = ","), " edges")
#> graph has 5,671 edges
igr <- dodgr_to_igraph (graph)

# -------- edge centrality --------
f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                              edges = TRUE, parallel = FALSE)
f_dodgr_par <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                                  edges = TRUE, parallel = TRUE)
f_igraph <- function (graph) igraph::edge_betweenness (graph)
rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_dp <- f_dodgr_par (graph),
             res_i <- f_igraph (igr),
             replications = 1, order = "relative") [, 1:4]
#>                           test replications elapsed relative
#> 2 res_dp <- f_dodgr_par(graph)            1   0.252    1.000
#> 3       res_i <- f_igraph(igr)            1   0.867    3.440
#> 1      res_d <- f_dodgr(graph)            1   0.917    3.639
identical (res_i, res_d$centrality)
#> [1] TRUE
identical (res_i, res_dp$centrality)
#> [1] TRUE

# -------- vertex centrality --------
f_dodgr <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                              edges = FALSE, parallel = FALSE)
f_dodgr_par <- function (graph) dodgr_centrality (graph, contract = FALSE,
                                                  edges = FALSE, parallel = TRUE)
f_igraph <- function (graph) igraph::betweenness (graph)
rbenchmark::benchmark (
             res_d <- f_dodgr (graph),
             res_dp <- f_dodgr_par (graph),
             res_i <- f_igraph (igr),
             replications = 1, order = "relative") [, 1:4]
#>                           test replications elapsed relative
#> 2 res_dp <- f_dodgr_par(graph)            1   0.246    1.000
#> 3       res_i <- f_igraph(igr)            1   0.838    3.407
#> 1      res_d <- f_dodgr(graph)            1   0.890    3.618
identical (unname (res_i), res_d$centrality)
#> [1] TRUE
identical (unname (res_i), res_dp$centrality)
#> [1] TRUE

Created on 2019-10-16 by the reprex package (v0.3.0)

@Robinlovelace @agila5 - now we're talking - dodgr_centrality is now parallelised, and should run several times faster than igraph. This is what we need to calculate centrality of huge graphs. (Note that now that it all works, the non-parallel version and accompanying parameter shown above will be ditched, and everything will always be in parallel.)

mpadge commented 4 years ago

Re-opened to implement threshold parameter - centrality measured only within a certain distance of each point converges to value measured over entire graph (within some constant scale factor), so this enables considerable speed-ups. Next commits will implement ...

agila5 commented 4 years ago

Hi Mark, amazing, as always, and, as always, I'm super lost 😰

Next will I'm going to get back to studying these topics!

mpadge commented 4 years ago

That commit and the few ones prior implement a couple of helper functions, estimate_centrality_threshold() and estimate_centrality_time(). These are very useful for big graphs, for which true centrality measures using every pair of points are often impractical. Calculating centrality only within a given distance threshold often provides good approximations to the full values, and the estimate_centrality_threshold() function provides an estimate of these thresholds for a specified error tolerance. The estimate_centrality_time() function can then be used to estimate the likely time required for a complete centrality calculation for a given threshold.


Now re-opening for hopefully the last time, to close upon writing a vignette on all these centrality calculations and options.

Robinlovelace commented 4 years ago

As an additional point, it seems that the centrality values are lost when converting back to sf:

library(dodgr)
graph <- weight_streetnet (hampi, wt_profile = "foot")
igr <- dodgr_to_igraph (graph)
#> Loading required namespace: igraph

system.time({res_igr <- igraph::edge_betweenness(igr)})
#>    user  system elapsed 
#>   0.504   0.000   0.504
system.time({res_dodgr <- dodgr_centrality(graph, contract = FALSE, edges = TRUE)})
#>    user  system elapsed 
#>   0.771   0.040   0.160
res_dodgr_sf <- dodgr_to_sf(res_dodgr)
#> Loading required namespace: sf
names(res_dodgr_sf)
#>  [1] "geom_num"      "edge_id"       "from_id"       "from_lon"     
#>  [5] "from_lat"      "to_id"         "to_lon"        "to_lat"       
#>  [9] "d"             "d_weighted"    "highway"       "way_id"       
#> [13] "time"          "time_weighted" "component"     "geometry"

Created on 2019-10-22 by the reprex package (v0.3.0)

Speedup is impressive though, good work Mark!