UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
128 stars 16 forks source link

Output dodgr_dists() #47

Closed chrijo closed 6 years ago

chrijo commented 6 years ago

Hallo,

I am not sure what the exact unit (kilometre, miles?) of the output of dodgr_dists() is and if it's working correctly. I tried to verify the results from dodgr_dists() with googlemaps and there are hugh differences I can't explain. Please, try the following code to reproduce my Problem.

library(dodgr) library(ggmap)

essen <- dodgr_streetnet("Essen, Germany") graph <- weight_streetnet(essen)

mainstation <- geocode("Hauptbahnhof, Essen, Germany") mall <- geocode("Limbeckerplatz 1, Essen, Germany")

names(mainstation) <- c("x","y") names(mall) <- c("x","y")

mainstation <- as.data.frame(mainstation) mall <- as.data.frame(mall)

d <- dodgr_dists(graph, from = mainstation,to = mall, wt_profile = "foot")

mapdist(from = c("Hauptbahnhof, Essen, Germany"), to = c("Limbeckerplatz 1, Essen, Germany"), mode = "walking")

address1 <- geocode("Brigittastrasse 25, Essen, Germany") address2 <- geocode("Kronprinzenstrasse 6, Essen, Germany")

names(address1) <- c("x","y") names(address2) <- c("x","y")

d <- dodgr_dists(graph, from = address1,to = address2, wt_profile = "foot")

mapdist(from = c("Brigittastrasse 25, Essen, Germany"), to = c("Kronprinzenstrasse 6, Essen, Germany"), mode = "walking")

address1 <- geocode("Engelsbecke 24, Essen, Germany") address2 <- geocode("Holunderweg 10, Essen, Germany")

names(address1) <- c("x","y") names(address2) <- c("x","y")

d <- dodgr_dists(graph, from = address1,to = address2, wt_profile = "foot")

mapdist(from = c("Engelsbecke 24, Essen, Germany"), to = c("Holunderweg 10, Essen, Germany"), mode = "walking")

mpadge commented 6 years ago

Thanks for opening this issue - it has revealed quite a few minor bugs in dodgr code. None that yet related directly to your query, but I'll have to iron them out first before addressing it directly. The distances themselves should nevertheless be accurate, although the were inaccurate in previous versions, so maybe make sure you've got the latest dev version installed (devtools::install_github("atfutures/dodgr"))? But note at the moment that doing that will likely fail to generate distances for several of these - a problem which I hope to fix asap.

Initial comments on your query prior to that:

  1. The distances are in kilometres
  2. Distances with different routing services will never be the same, and may often differ quite a bit, just as the ways two people walk between any two points will commonly differ. The routing profiles for different transport modes can be directly controlled within dodgr, and could be tweaked to closely agree with the google API, but that's not really the point of this package.
chrijo commented 6 years ago

Dear Mark, thanks for your fast and accurate reply. I installed the dev-version, however the output of dodgr_dists() now is NA. I guess it has to do with the bugs you've mentioned. So I'm looking foward for a improved version of your package! Please let me know when it's fixed. Btw do you know the python module OSMnx which seems to have a similar approach to routing on osm networks and it has a nice feature of network simplification by reducing the number of redunandant nodes. Thank you for your efforts!

mpadge commented 6 years ago

Thanks @chrijo, and thanks, yes, i'm aware of OSMnx. The diff with dodgr is that it really is intended for massive routing, so the from and to points can be long vectors, and it will return a full matrix of distances. When I solve these bugs, I'll paste modified version of your code above here to show what I mean. dodgr also does network simplification via dodgr_contract_graph(), and the routing is conducted on the simplified/contracted graph to speed things up. dodgr is also much faster: it can do like 1,000-times-1,000 points (so 1 million calculations) in a few minutes (depending on network size).

mpadge commented 6 years ago

Okay, here we go with a massive response. I've been mostly using dodgr for enormous routing tasks for which i've been able to ignore the NA values, yet did mean to examine this further. This issue has given the ideal motivation to do that, so thanks! The following code should now work:

library (dodgr) # latest version
library(ggmap)
library(magrittr)
mainstation <- geocode("Hauptbahnhof, Essen, Germany")
mall <- geocode("Limbeckerplatz 1, Essen, Germany")
address1 <- geocode("Brigittastrasse 25, Essen, Germany")
address2 <- geocode("Kronprinzenstrasse 6, Essen, Germany")
address3 <- geocode("Engelsbecke 24, Essen, Germany")
address4 <- geocode("Holunderweg 10, Essen, Germany")

xy <- rbind (mainstation, mall, address1, address2,
             address3, address4) %>%
    data.frame ()
essen <- dodgr_streetnet (pts = xy, expand = 0.2, quiet = FALSE)
graph <- weight_streetnet (essen, wt_profile = "foot")
dodgr_dists (graph, from = xy, to = xy)

All distances are there except for to and from the mall. The reason for this can be seen by zooming in on a smaller section:

library (sf)
# get street network just from a very small area
xybb <- rbind (c (7.00421, 51.457),
               c (7.00737, 51.4585)) %>%
    data.frame () %>%
    dplyr::rename (lon = X1, lat = X2)
essen_zoom <- dodgr_streetnet (pts = xybb, expand = 0, quiet = FALSE)
graph <- weight_streetnet (essen_zoom, wt_profile = "foot")
graph <- graph [which (graph$component == 1), ]

Then use just 2 routing points: the one inside the mall, and nearest street junction just outside, which has the following OSM ID:

nd <- "1590256404"
verts <- dodgr_vertices (graph)
i <- which (verts$id == nd)
xy2 <- rbind (c (7.005994, 51.45774), # the mall
             as.numeric (verts [i, 2:3])) %>%
    data.frame () %>%
    dplyr::rename (lon = X1, lat = X2)

This can be viewed with mapview to examine where the routing points are:

# convert pts to sf:
xy2sf <- function (xy)
{
    res <- xy %>%
        dplyr::select (lon, lat) %>%
        as.matrix () %>%
        split (., seq (nrow (.))) %>%
        lapply (sf::st_point) %>%
        sf::st_sfc () %>%
        sf::st_sf ()
    sf::st_crs (res) <- 4326
    return (res)
}
library (mapview)
library (sf)
g_sf <- dodgr_to_sf (graph)
verts <- dodgr_vertices (graph)
xy_match <- verts [match_pts_to_graph (verts, xy2), c ("x", "y")] %>%
    dplyr::rename (lon = x, lat = y) %>%
    data.frame ()
m <- mapview (essen_zoom) %>%
    addFeatures (xy2sf (xy2), color = "red", radius = 4) %>%
    addFeatures (xy2sf (xy_match), color = "green", radius = 4)
print (m)

Hovering over those points reveals that they are on the OSM elements with IDs of #200178285 (the rotunda) and 200178377 (the internal passage). These do not, however, meet, as can be seen by filtering the above code to plot only those elements:

indx <- which (essen_zoom$osm_id %in% c ("200178285", "200178377"))
m <- mapview (essen_zoom [indx, ]) %>%
    addFeatures (xy2sf (xy2), color = "red", radius = 4) %>%
    addFeatures (xy2sf (xy_match), color = "green", radius = 4)
print (m)

Inspecting those ways on OSM reveals that the first one is UG (level = -1). The routing point has been matched there simply because it's the closest (x, y), but it can't actually be reached. (The other relevant ways are rotunda, more rotunda, the internal way with the routing point (on EG), the internal way on 1OG, and one on UG.)

The solution in this case is simply to filter out the ways that are not on the ground level. This is a bit fiddly because this seems to work:

levs <- paste0 (essen$level) # sf data cols are factors
indx <- which (!levs %in% c ("-1", "1"))
graph <- weight_streetnet (essen [indx, ], wt_profile = "foot")
d <- dodgr_dists (graph, from = xy, to = xy)

while this more generic approach does not:

indx <- which (!(is.na (levs) | grepl ("0", levs)))
graph <- weight_streetnet (essen [indx, ], wt_profile = "foot")
d <- dodgr_dists (graph, from = xy, to = xy)

The exact kind of level-filtering will depend on local data. (And if you run into any problems, you might need to re-generate the street network after having loaded sf.) Moreover, note that this neither can nor should be done automatically, because it would then automatically disable any routing through any level != 0. It is neverthelss obviously very important to highlight this, which I will do by extending the examples of dodgr_dists.

Finally, compare these to the google ones

library (ggmap)
dg <- d

dg [1, 2] <- dg [2, 1] <- 
    mapdist(from = c("Hauptbahnhof, Essen, Germany"),
            to = c("Limbeckerplatz 1, Essen, Germany"),
            mode = "walking")$m
dg [1, 3] <- dg [3, 1] <- 
    mapdist(from = c("Hauptbahnhof, Essen, Germany"),
            to = c("Brigittastrasse 25, Essen, Germany"),
            mode = "walking")$m
dg [1, 4] <- dg [4, 1] <- 
    mapdist(from = c("Hauptbahnhof, Essen, Germany"),
            to = c("Kronprinzenstrasse 6, Essen, Germany"),
            mode = "walking")$m
dg [1, 5] <- dg [5, 1] <- 
    mapdist(from = c("Hauptbahnhof, Essen, Germany"),
            to = c("Engelsbecke 24, Essen, Germany"),
            mode = "walking")$m
dg [1, 6] <- dg [6, 1] <- 
    mapdist(from = c("Hauptbahnhof, Essen, Germany"),
            to = c("Holunderweg 10, Essen, Germany"),
            mode = "walking")$m

dg [2, 3] <- dg [3, 2] <- 
    mapdist(from = c("Limbeckerplatz 1, Essen, Germany"),
            to = c("Brigittastrasse 25, Essen, Germany"),
            mode = "walking")$m
dg [2, 4] <- dg [4, 2] <- 
    mapdist(from = c("Limbeckerplatz 1, Essen, Germany"),
            to = c("Kronprinzenstrasse 6, Essen, Germany"),
            mode = "walking")$m
dg [2, 5] <- dg [5, 2] <- 
    mapdist(from = c("Limbeckerplatz 1, Essen, Germany"),
            to = c("Engelsbecke 24, Essen, Germany"),
            mode = "walking")$m
dg [2, 6] <- dg [6, 2] <- 
    mapdist(from = c("Limbeckerplatz 1, Essen, Germany"),
            to = c("Holunderweg 10, Essen, Germany"),
            mode = "walking")$m

dg [3, 4] <- dg [4, 3] <- 
    mapdist(from = c("Brigittastrasse 25, Essen, Germany"),
            to = c("Kronprinzenstrasse 6, Essen, Germany"),
            mode = "walking")$m
dg [3, 5] <- dg [5, 3] <- 
    mapdist(from = c("Brigittastrasse 25, Essen, Germany"),
            to = c("Engelsbecke 24, Essen, Germany"),
            mode = "walking")$m
dg [3, 6] <- dg [6, 3] <- 
    mapdist(from = c("Brigittastrasse 25, Essen, Germany"),
            to = c("Holunderweg 10, Essen, Germany"),
            mode = "walking")$m

dg [4, 5] <- dg [5, 4] <- 
    mapdist(from = c("Kronprinzenstrasse 6, Essen, Germany"),
            to = c("Engelsbecke 24, Essen, Germany"),
            mode = "walking")$m
dg [4, 6] <- dg [6, 4] <- 
    mapdist(from = c("Kronprinzenstrasse 6, Essen, Germany"),
            to = c("Holunderweg 10, Essen, Germany"),
            mode = "walking")$m

dg [5, 6] <- dg [6, 5] <- 
    mapdist(from = c("Engelsbecke 24, Essen, Germany"),
            to = c("Holunderweg 10, Essen, Germany"),
            mode = "walking")$m
dg <- dg / 1000

plot (as.vector (d), as.vector (dg),
      xlab = "dodgr", ylab = "google")
lines (0:10, 0:10, lty = 2)

dists

The two are almost perfectly correlated, yet dodgr gives generally much shorter distances. I've not examined what the google router is doing there, but since most of the central area of Essen is pedestrianised, shorter distances can generally assume to be simply better here, I'd say. (And coincidentally, I actually live in Muenster, so know the whole area we're talking about here quite well, which made this a bit more fun than it would have been.)

This also demonstrates what I mean in the previous comment about dodgr and large routing tasks - constructing that matrix from the google API is a pain in the butt. I'll leave this issue open until the example code has been updated.

chrijo commented 6 years ago

Dear Mark, thank you very much for your detailed reply. It looks like I really have to get into osm to understand the pros and cons of the data. The differences in distances of Google API and dodgr is intressting. I didn't looked into your source code or Google API, however when matching of coordinates and neareast node is done, do you add this distance to the calculated path-distance?

mpadge commented 6 years ago

nope, that's not added, but is generally only a couple of metres. (Adding would be somewhat dependent on the context in which dodgr_dists was called, so would be slightly non-trivial, and not really advantageous. The large discrepancy between dodgr and google makes this kind of difference seem even less important.)

A short summary for you:

method pro con
google will always give a result total lack of flexibility, including no ability to route within, for example, different levels of a shopping mall
dodgr may fail if input data not adequately prepared much more flexible
chrijo commented 6 years ago

Concerning distances: dodgrdist() calculates about 637m from Kronprinzenstraße 6 to Brigittastraße 25 (4 -> 3), however linear distance is about 900 m.

mpadge commented 6 years ago
library(dodgr)
library(ggmap)
address1 <- geocode("Brigittastrasse 25, Essen, Germany")
address2 <- geocode("Kronprinzenstrasse 6, Essen, Germany")

xy <- rbind (address1, address2) %>% data.frame ()
essen <- dodgr_streetnet (pts = xy, expand = 0.2, quiet = FALSE)
graph <- weight_streetnet (essen, wt_profile = "foot")
dodgr_dists (graph, from = xy, to = xy)

That gives abut 530m one way, 560m the other, and

haversine <- function (xy1, xy2)
{
    xd <- (xy2 [1] - xy1 [1]) * pi / 180
    yd <- (xy2 [2] - xy1 [2]) * pi / 180
    d <- sin (yd / 2) * sin (yd / 2) +
        cos (xy2 [2] * pi / 180) *
        cos (xy1 [2] * pi / 180) * sin (xd / 2) * sin (xd / 2)
    as.numeric (2 * 3671 * asin (sqrt (d)))
}
haversine (address1, address2)

gives a straight line dist of 410m. Everything seems okay.

chrijo commented 6 years ago

Thanks! I tried to reproduce your example, but I got totally different results. However, I upgrade R to the latest version and install the latest dev-version of dodgr. Finally I got the same results . However, if I calculate the linear distance with library geospehere or sp it is about 714 m.

library(geosphere)
distm(x = xy[1,],y = xy[2,],distHaversine)
library(sp)
spDistsN1(as.matrix(xy),as.matrix(xy[2,]), longlat = T)
mpadge commented 6 years ago

okay, now I really owe you a debt of gratitude for persisting with this - the previous version had the Earth's radius mistakenly entered as 3671 (and the above code was copied straight from there!), when it should have been 6371 - one small typo with huge implications! I've now updated it to the (overly-)precise value used in geosphere, with the following results:


```{r}
library(dodgr)
library(ggmap)
library(magrittr)
address1 <- geocode("Brigittastrasse 25, Essen, Germany")
address2 <- geocode("Kronprinzenstrasse 6, Essen, Germany")

xy <- rbind (address1, address2) %>% data.frame ()
essen <- dodgr_streetnet (pts = xy, expand = 0.2, quiet = FALSE)
graph <- weight_streetnet (essen, wt_profile = "foot")
dodgr_dists (graph, from = xy, to = xy)

That now gives around 930m in one direction; 980 in the other, compared with

geosphere::distm (xy, fun = distHaversine)

giving 714m. The previous differences between dodgr and google distances were also obviously influenced by this typo. Updated version: plot

And the dodgr distances are now slightly longer than the google ones (by around 10%), which is the kind of result that should arise if the pedestrian routing model for dodgr actually chooses nicer but slightly longer routes. google really does just do a pretty dumb default route in most cases, as can be seen by the fact that it gives exactly the same result regardless of which direction you travel: Both of the following give 863m:

mapdist(from = c("Brigittastrasse 25, Essen, Germany"),
        to = c("Kronprinzenstrasse 6, Essen, Germany"),
        mode = "walking")$m
mapdist(from = c("Kronprinzenstrasse 6, Essen, Germany"),
        to = c("Brigittastrasse 25, Essen, Germany"),
        mode = "walking")$m

The (now finally correct!) dodgr equivalents are the aforementioned 930 and 980m, because direction should and does matter. This in turn means that dodgr should give generally slightly higher values than google, which is what the graph reveals.

See the above commit for my expression of gratitude to you. I'll have to rush out a new version asap to fix this!


@RobinLovelace check this: dodgr_dists has been wrong all along coz I typed "3671" instead of "6371" for the Earth's radius! This won't have affected any of our work, and has no effect on dodgr_flows, but has to chalk up as one of the most consequential typos of my coding career. Discovered thanks to @chrijo

chrijo commented 6 years ago

Yeah, It's been a bit awkward for me to insist so much ;) Many thanks for developing this very help full package!!!

mpadge commented 6 years ago

It was only your insistence and persistence that revealed my stupid mistake, so this is definitely a case where it's been supremely helpful. Thanks!

Robinlovelace commented 6 years ago

Great example of the importance of open and transparent development workflow. I'd just like to echo @chrijo's gratitude for flagging it and to Mark for acting on it!

mpadge commented 6 years ago

To reiterate: this whole issue has been "supremely helpful"!