UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
128 stars 16 forks source link

Distance matrix is not symmetric and violates triangle inequality #178

Closed mguzmann closed 2 years ago

mguzmann commented 2 years ago

I am working with the Berlin pbf file: http://download.geofabrik.de/europe/germany/berlin-latest.osm.pbf If I calculate distances between random nodes in the graph, the resulting matrix is not symmetric. If I force that matrix to be symmetric, it violates triangle inequality.

I guess this is because of some routing decisions which make going from A to B different than going from B to A. It would be very useful to be able to ignore directionality and get a symmetric matrix that respects triangle inequalities. Is there a work around?

mpadge commented 2 years ago

The easiest way to do that is to use the `merge_directed_graph() function. That will make all edges undirected, and you'll get a symmetrical distance matrix.

Note, however, that the matrices generated from directed graphs never violate the triangle inequality. They are merely asymetric, so that d(A,B) will not necessarily equal d(B,A). But it will always be the case that d(A,B) + d(B,C) >= d(A,C), which is the triangle inequality. If potential violation of that is really your concern, you don't need to worry. But if asymmetric matrices are your actual concern, then just "merge" the directed graph.

mguzmann commented 2 years ago

Thanks for your answer. Maybe I'm doing something wrong, but it seems like my results are not respecting triangle inequality. This example works for me (to produce matrices which violate tri ineq):

library(tidyverse)
library(dodgr)
library(osmextract)
library(sf)

## Adapted from the fossil package
tri.ineq.show <- function (D) 
{
    mat <- as.matrix(D)
    n <- dim(mat)[1]
    ineq.idx <- c()
    for (i in 1:(n - 2)) {
        for (j in (i + 1):(n - 1)) {
            for (k in (j + 1):n) {
                sds <- c(mat[j, i], mat[k, i], mat[k, j])
                lng <- max(sds)
                if (lng > (sum(sds) - lng)) {
                    # cat(sprintf("Check triangle %d %d %d\n",i,j,k))
                    ineq.idx <- rbind(ineq.idx,c(i,j,k))
                }
            }
        }
    }
    return(ineq.idx)
}

dat <- oe_read("./data-berlin/berlin-latest.osm.pbf")

graph <- weight_streetnet(dat, wt_profile = "foot")

set.seed(1234)
from <- sample (graph$from_id, size = 100)

d_dist <- dodgr_dists(graph
                   , from = from
                   , to = from
                   , shortest = TRUE)

fossil::tri.ineq(d_dist[1:30,1:30])
fossil::tri.ineq(d_dist[1:10,1:10])

tri.ineq.show(d_dist[1:30,1:30])

d_dist[c(1,15,28), c(1,15,28)]

The symmetric part is less important to me, I could just copy the upper or lower triangle, but these triangle inequalities really wreck havoc in my models.

mguzmann commented 2 years ago

For the Berlin data:

wget http://download.geofabrik.de/europe/germany/berlin-latest.osm.pbf

mpadge commented 2 years ago

Yep, i can confirm that you're right here. There are several issues here, starting with dodgr generating inaccurate distance calculations to begin with, which the following code fixes by modifying all distances to the most accurate versions using an sc-format network and using geodesic calculations. But that then reveals (with a simplified triangle inequality checker) that the inequality is indeed violated.

library (dodgr)
x <- readRDS ("/<path>/<to>/berlin-sc.Rds")
graph <- weight_streetnet (x, wt_profile = "foot")
#> Loading required namespace: geodist
#> Loading required namespace: dplyr

# manually change all edge distances to geodesic
xy0 <- graph [, c (".vx0_x", ".vx0_y")]
xy1 <- graph [, c (".vx1_x", ".vx1_y")]
wt <- graph$d_weighted / graph$d
graph$d <- geodist::geodist (xy0, xy1, paired = TRUE, measure = "geodesic")
graph$d_weighted <- graph$d * wt

set.seed(1234)
from <- sample (graph$.vx0, size = 100)

d_dist <- dodgr_dists(graph
                   , from = from
                   , to = from
                   , shortest = TRUE)

tri.ineq.show <- function (D) 
{
    mat <- as.matrix(D)
    n <- dim(mat)[1]
    ineq.idx <- c()
    for (i in 1:(n - 2)) {
        for (j in (i + 1):(n - 1)) {
            for (k in (j + 1):n) {
                long <- mat [i, j] + mat [j, k]
                short <- mat [i, k]
                if (is.na (short) | is.na (long)) {
                    next
                }
                if ((short - long) > 1e-6) {
                    ineq.idx <- rbind (ineq.idx, c (i, j, k))
                }
            }
        }
    }
    return(ineq.idx)
}
ineq <- tri.ineq.show (d_dist)
nrow (ineq)
#> [1] 1035
i <- ineq [1, 1]
j <- ineq [1, 2]
k <- ineq [1, 3]
c (d_dist [i, j], d_dist [j, k], d_dist [i, j] + d_dist [j, k], d_dist [i, k])
#> [1]  4329.437 24118.779 28448.216 28514.949

Created on 2022-05-02 by the reprex package (v2.0.1)

I'll fix up the internal distance calculations first, then have a look at why the inequality is violated. Thanks for uncovering this :rocket: :heavy_check_mark:


Finally, note that one further issue with your approach is that the osmextract package relies on GDAL via sf to read the data, and that actually strips all the OSM information necessary to construct a proper dodgr network. So you'll never get accurate distances via osmextract. What you can do is use osmium to convert the .pbf to .osm/.xml format, and then use osmdata to read that in.

mpadge commented 2 years ago

Finally getting back to this issue. I realised in the meantime that this behaviour is actually perfectly normal and expected. dodgr implements weighed routing, yet returns the absolute, unweighted distances. The weighted distances A -> B plus B -> C will always be >= A -> C, but the absolute final distances may well be less. The weighting profiles reflect preferences for different kinds of ways. The best weighted path connecting A -> C may not necessarily be the shortest overall, and routing through some other mid-point B may indeed give shorter overall journeys. The only thing that can be stated with certainty is that those journeys will have longer weighted distances.

This suggests that a solution in cases where triangle inequalities should hold is to use something like wt_profile = 1, and so not to weight each edge. But note even then that if using the full-detail weighting on SC-format networks, you still get the effects of things like waiting times at traffic lights, or directional turning penalties. So even then while that will mostly produce values of d_weighted that are equal to equvalent direct distances, d, that will not always be so, and you'll still end up with some d_weighted > d. Those will generally reflect compound junctions with complicated turning rules, and then you may still see the trinagle inequality violated even when routing on an unweeighrted network.

This is a good thing, and reflects the routing engine doing what is expected. Realistic routing may always run the risk of violating tinangle inequalities. Even if you attempt to reduce as much as possible to some kind of "pure" geometric network (for example, wt_profile = 1 and turn_penalty = FALSE), you may still violate geometric constrainsts because of unknown prohibitions within the network such as one-way rules.

So all is good, and i'll close this now, but feel fre to add additional comments @mguzmann. Thanks for prompting these important insights.