Closed mem48 closed 3 years ago
@unbrother you will be intrested in this
Thanks @mem48, and you are indeed correct, but that is an intentional design decision as explained in #89. Everything is hard-coded as <double>
(in C++ terms), and not <int>
, because of the way the different heap sort algorithms are exposed within the code. However, if i do decide to pursue #117, which i suspect should be done, then that problem will no longer apply, and i'll be able to properly template the interfaces for both <int>
and <double>
. I'll keep this open as a parallel issue to that one, and keep you informed.
In the meantime, it would help if you could opine on #117 - have you ever resorted to anything but the default heap
parameter? I suspect nobody really has, and it doesn't make much difference except for network structures that are wildly different from typical spatial networks. Even then, having a proper <int>
rather than <double>
heap is likely to make at least as much difference as different heaps, because <int>
sorting is more efficient.
Ok, but if these are floating point errors then shouldn't they be very small. I could understand getting a flow of 5.000001 but a flow of 5.5 seems like a large error.
I did a few more tests
library(dodgr)
graph <- weight_streetnet (hampi)
from <- sample (graph$from_id, size = 10)
to <- sample (graph$to_id, size = 5)
to <- to [!to %in% from]
flows <- matrix(rep(1, length(from) * length(to)),
nrow = length(from))
graph1 <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows, heap = "FHeap")
graph2 <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows, heap = "BHeap")
graph3 <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows, heap = "TriHeap")
#crashes graph4 <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows, heap = "TriHeapExt")
graph5 <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows, heap = "Heap23")
identical(graph1, graph2) #TRUE
identical(graph1, graph3) #TRUE
identical(graph1, graph5) #TRUE
summary(graph1$flow[graph1$flow > 0])
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.008475 0.015633 0.026527 0.033479 0.042153 0.090516
Changing the heap made no difference except for TriHeapExt
which crashed R (another issue to open). So from my perspective, they are not very useful. I also don't fully understand why and when to choose another heap, so I expect most users are using the default.
Also, notice the flow values I get. My input was a matrix of ones, so I would expect the minimum non-zero flow to be 1 and all other values to be a multiple of 1. Instead, I get some very small numbers.
As the hampi example is quite complex I tried something even simpler.
# Make a simple Tree in igraph
library(igraph)
graph_i <- make_tree(19,2,"undirected")
plot(graph_i)
# Convert to dodgr and make undirected
graph <- as_data_frame(graph_i)
graph2 <- graph[,2:1]
names(graph2) <- c("from","to")
graph <- rbind(graph, graph2)
graph$d <- 1
# Flow of 1 from all to all (same as edge betweenness)
flows <- matrix(rep(1, 19 * 19), ncol = 19)
graph <- dodgr_flows_aggregate(graph, from = 1:19, to = 1:19, flows = flows)
graph$flow
[1] 19.452381 18.452381 19.619048 12.866667 12.076190 12.076190 12.492857 12.492857 5.283333
[10] 5.283333 5.054762 5.054762 5.054762 5.054762 5.171429 5.171429 5.171429 5.171429
[19] 19.452381 18.452381 19.619048 12.866667 12.076190 12.076190 12.492857 12.492857 5.283333
[28] 5.283333 5.054762 5.054762 5.054762 5.054762 5.171429 5.171429 5.171429 5.171429
# Do edge betweeness in igraph
E(graph_i)$between <- edge.betweenness(graph_i)
E(graph_i)$between
[1] 88 84 84 48 48 48 48 48 18 18 18 18 18 18 18 18 18 18
I would have expected the dodgr
results to be half of the igraph
results as the dodgr
graph if directed but has twice the edges. But these results are very different.
Aha, i see. It's the norm_sums
parameter of the dodgr_flows_...
functions, which defaults to norm_sums = TRUE
, for reasons demonstrated in #121. As explained in all the help files,
' @note The
norm_sums
parameter should be used whenever densities at origins' and destinations are absolute values, and ensures that the sum of resultant
' flow values throughout the entire network equals the sum of densities at all
' origins. For example, with
norm_sums = TRUE
(the default), a flow from a' single origin with density one to a single destination along two edges will
' allocate flows of one half to each of those edges, such that the sum of flows
' across the network will equal one, or the sum of densities from all origins.
' The
norm_sums = TRUE
option is appropriate where densities are relative' values, and ensures that each edge maintains relative proportions. In the
' above example, flows along each of two edges would equal one, for a network
' sum of two, or greater than the sum of densities.
library (dodgr)
graph <- weight_streetnet (hampi)
from <- sample (graph$from_id, size = 10)
to <- sample (graph$to_id, size = 5)
to <- to [!to %in% from]
flows <- matrix(rep(1, length(from) * length(to)),
nrow = length(from))
graph <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows)
summary(graph$flow[graph$flow > 0])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 0.003401 0.011322 0.013446 0.023629 0.031730 0.079733
graph <- dodgr_flows_aggregate(graph, from = from, to = to, flows = flows,
norm_sums = FALSE)
#> Warning in prepare_graph(graph, from, to): graph already has a 'flow' column;
#> this will be overwritten
summary(graph$flow[graph$flow > 0])
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 1.000 3.000 4.000 4.114 5.000 10.000
# Make a simple Tree in igraph
library(igraph)
graph_i <- make_tree(19,2,"undirected")
#plot(graph_i)
# Convert to dodgr and make undirected
graph <- as_data_frame(graph_i)
graph2 <- graph[,2:1]
names(graph2) <- c("from","to")
graph <- rbind(graph, graph2)
graph$d <- 1
# Flow of 1 from all to all (same as edge betweenness)
flows <- matrix(rep(1, 19 * 19), ncol = 19)
graph <- dodgr_flows_aggregate(graph, from = 1:19, to = 1:19, flows = flows)
graph$flow
#> [1] 19.452381 18.452381 19.619048 12.866667 12.076190 12.076190 12.492857
#> [8] 12.492857 5.283333 5.283333 5.054762 5.054762 5.054762 5.054762
#> [15] 5.171429 5.171429 5.171429 5.171429 19.452381 18.452381 19.619048
#> [22] 12.866667 12.076190 12.076190 12.492857 12.492857 5.283333 5.283333
#> [29] 5.054762 5.054762 5.054762 5.054762 5.171429 5.171429 5.171429
#> [36] 5.171429
graph <- dodgr_flows_aggregate(graph, from = 1:19, to = 1:19, flows = flows,
norm_sums = FALSE)
#> Warning in prepare_graph(graph, from, to): graph already has a 'flow' column;
#> this will be overwritten
graph$flow
#> [1] 88 84 84 48 48 48 48 48 18 18 18 18 18 18 18 18 18 18 88 84 84 48 48 48 48
#> [26] 48 18 18 18 18 18 18 18 18 18 18
# Do edge betweeness in igraph
E(graph_i)$between <- edge.betweenness(graph_i)
E(graph_i)$between
#> [1] 88 84 84 48 48 48 48 48 18 18 18 18 18 18 18 18 18 18
Created on 2020-06-23 by the reprex package (v0.3.0)
... and actually, that was an additional explicit reason why i did not implement strict <int>
aggregation - the norm_sums = TRUE
option requires <double>
throughout, meaning a default of <double>
is the only sensible option. This norm_sums
business was implemented because the default value of TRUE
is really the only sensible option for aggregation of actual flows. Does that resolve this issue @mem48?
Ok, norm_sums = FALSE
gives me the results that I was expecting, so that is a solution.
But I don't understand when norm_sums = TRUE
is useful. The way I would use dodgr_flows_aggregate
is to have flows representing people traveling, so usually whole numbers. Then I want to count how many people pass over each part of the road network. In this case norm_sums = TRUE
seems to give a relative rather than absolute measure? But the help documentation suggests that norm_sums = TRUE
gives absolute values.
Say you've got the following network:
With a flow of 10 from A -> C and a flow of 10 from D -> C. Direct aggregation of flows will then give the following:
from | to | flow |
---|---|---|
A | B | 10 |
B | C | 20 |
D | B | 10 |
And the total flow aggregated throughout the entire network is 40, even though only 10 units are flowing from each of A and D. It might rightly be asserted that this total flow should be constrained to the total amounts emanating from each source/origin point, and so be constrained to 10 + 10 = 20. This is what norm_sums
does, resulting in flow values of:
from | to | flow |
---|---|---|
A | B | 5 |
B | C | 10 |
D | B | 5 |
So you can see what it effectively does is to "spread" the flow from, say A -> C, equally along each of the intervening edges. If there are N edges between any two origin -> destination points with a flow between them of f, then the flow on each of those intervening edges will be f / N if norm_flows = TRUE
and the total flow throughout the whole network will equal f, otherwise the flow along each edge will be f, and the total flow throughout the whole network will equal f N.
One context for considering the appropriateness of this whole functionality is OSM. A single way might be represented by a bunch of nodes along it simply because it is highly curved. Aggregating flows along with way using norm_sums = FALSE
will lead to an aggregate flow along that way that is simply proportional to the entirely arbitrary number of points used to define it. A more realistic interpretation of what flow aggregation is or should be might suggest that the flow between any two junction points should aggregate along the entire intervening way to the value specified between those two points, and so the flow for each segment should be the f / N
values described above. Make sense?
I suppose it depends on what you then do with the results. I'm plotting them on a map, so a highly curved road is clear. But if you did some statiticaly analysis of the network I can see the normalisation being useful.
If all the flow values input to dodgr_flows_aggregate are whole numbers I would expect all the resulting flows to also be whole numbers, as they are simple sums.
Reprex