UrbanAnalyst / dodgr

Distances on Directed Graphs in R
https://urbananalyst.github.io/dodgr/
128 stars 16 forks source link

Necessary shapefile? Navigation in wrong direction with Geofabric shapefile #96

Closed mkvasnicka closed 5 years ago

mkvasnicka commented 5 years ago

Hi!

This is more of a question than an issue. However, I believe others might benefit from it as well.

I need to have the shapefile stored locally for reproducibility reasons so I can't use dodgr_streetnet() function every time. Moreover, I don't know how to make dodgr_streetnet() function download the shapefile for the whole country (the Czech Republic or Austria) at once. Therefore, I downloaded the shapefile for my country from Geofabric. Even though the dataset has a bit different structure (many fields are missing and some are named differently), I was able to construct a graph and navigate with it. However, I found a huge problem: dodgr navigates jolly well in the opposite direction on some routes. This is not happening (at least on my test cases) when I download the shapefile with dodgr_streetnet() function. Hence I guess that something important is missing in the Geofabric shapefiles. Hence my questions:

Where can I get a shapefile for a whole country that could be correctly navigated with dodgr? Which fields are used to learn that a road has separated one-way lines?

Many thanks for your answer. Many thanks for providing dodgr as well.

Best wishes, Michal

mdsumner commented 5 years ago

fwiw, and this is no help whatsoever but might provide some explanation - no shapefile or any Simple Features line network can represent direction along a line, there's nothing in the specification to provide that ordering of vertices, and no way to store it without going outside of what the standard dictates (which is why absolutely everyone goes outside - literally no software aligns to the standard, and that's because it's simply not suitable for real work - it's mainly a contract for different systems to communicate a lowest common denominator to each other). Many formats you see in SF contexts can store order, but that's not carried into the standard itself.

So, SF can't store/represent line order, and what we see is many algorithms will work in a particular scan line, often reversing the native order (because that's how it scanned through the coordinates).

Even if you did have a shapefile augmented by extra information about line order, that would be fragile to many basic geometry operations - so would have to be managed very carefully.

mpadge commented 5 years ago

Yeah, thanks @mdsumner. geofabrik offers three primary formats: .shp, .pbf, and .osm.bz2. Only the latter of these is currently compatible with osmdata->dodgr. pbf is planned but not yet implemented, shp will never work. So even if you get a .shp file to somehow magically appear to "work" within dodgr, it ought not be expected to give appropriate output.

The only current way to use dodgr with geofabrik dumps is:

  1. Cut desired geography with osmium extract -b xmin,ymin,xmax,ymax myfile.osm.bz2 -o myfile.osm.bz2;
  2. Reduce to highways only with osmium tags-filtermyfile.osm.bz2 w/highway -o myfile-highways.osm.bz2`;
  3. Convert to osm/xml with bzip2 -dk myfile-highways.osm.bz2 # or "-d" to overwrite
  4. Import that using omsdata / dodgr

It's a workflow that I've been meaning to verify for a while, yet had not actually done so, and so ... I can now confirm that this works. Having followed the above steps, this code is reproducible:

library (dodgr)
library (osmdata)
#> Data (c) OpenStreetMap contributors, ODbL 1.0. http://www.openstreetmap.org/copyright
l <- list.files (".", pattern = "*.osm$", full.names = TRUE)
x <- osmdata_sf (doc = l) %>%
    osm_poly2line ()
net <- weight_streetnet (x$osm_lines, wt_profile = "bicycle")
#> The following highway types are present in data yet lack corresponding weight_profile values: platform, corridor,
head (net)
#>   geom_num edge_id    from_id from_lon from_lat      to_id   to_lon
#> 1        1       1   25832081 8.793655 53.06641 2576500505 8.793675
#> 2        1       2 2576500505 8.793675 53.06640   25832081 8.793655
#> 3        1       3 2576500505 8.793675 53.06640   21030859 8.793691
#> 4        1       4   21030859 8.793691 53.06639 2576500505 8.793675
#> 5        1       5   21030859 8.793691 53.06639 2576500504 8.793708
#> 6        1       6 2576500504 8.793708 53.06638   21030859 8.793691
#>     to_lat        d d_weighted  highway  way_id component      time
#> 1 53.06640 1.823666   2.431554 tertiary 4003711         1 0.4376798
#> 2 53.06641 1.823666   2.431554 tertiary 4003711         1 0.4376798
#> 3 53.06639 1.356648   1.808864 tertiary 4003711         1 0.3255955
#> 4 53.06640 1.356648   1.808864 tertiary 4003711         1 0.3255955
#> 5 53.06638 1.523386   2.031181 tertiary 4003711         1 0.3656127
#> 6 53.06639 1.523386   2.031181 tertiary 4003711         1 0.3656127
#>   time_weighted
#> 1     0.5835730
#> 2     0.5835730
#> 3     0.4341273
#> 4     0.4341273
#> 5     0.4874835
#> 6     0.4874835

Created on 2019-06-27 by the reprex package (v0.3.0)

@mkvasnicka Please let me know if this works for you. Thanks again for pushing the package into untested domains!

mkvasnicka commented 5 years ago

Many thanks for your fast answer. Few more questions:

  1. Is restricting with osmium extract necessary (I can download just one country from Geofabric)?
  2. Can I read it with sf via st_read("my_map.osm", "lines") and supply it to dodger?
  3. If I can do (2), can I filter out some types of roads with sf/dplyr filter(), and then supply it to dodgr?
  4. If I could do (2), how can I filter out some types of roads? (I only want to keep motor way to tertiary.)

Many thanks once more.

mpadge commented 5 years ago

It is not necessary to use osmium, but just makes it much easier to work with geofabrik files. osmium will do all of your desired pre-processing - see the second example on "filtering by tags" in the osmium manual.

Specific answers to your questions:

  1. No, not necessary but often helpful is you're only interested in a sub-area
  2. No, dodgr is designed to accept street networks from osmdata and will generally not work as well with networks read in with sf::st_read(). (sf uses GDAL, which strips a lot of useful data out of OSM data, while the osmdata package retains it.)
  3. No but yes: You can't do (2), but you can nevertherless filter ways in the dodgr network by filtering on the "highway" column.
  4. As above, but note that it's still more efficient to do what I said at the outset, and use osmium to filter your extract to keys and values, then use osmdata and dodgr to work with that.

Summary: Either (3) or (4) will work, but (2) won't work

mkvasnicka commented 5 years ago

O.K. Many thanks I'm going to try it.

mkvasnicka commented 5 years ago

Hi!

Your approach does indeed the trick.

May I ask two more questions?

  1. Is there a better way to read the OSM file than

    read_osm <- function(file) {
    osm <- osmdata::osmdata_sf(doc = file) %>%
        osmdata::osm_poly2line()
    osm$osm_lines
    }

    This approach first loads all useless stuff like points etc. and then throws them away, which costs a lot of time and memory.

  2. The resulting sf data.frame is terribly huge. I guess that most fields are not necessary for navigation. Which fields do I need to keep and which can I filter out? (I have to conserve as much memory as possible. Currently, I can produce a graph for one city but not for the whole country. I have 16 GB.)

Many thanks once more. Michal

mpadge commented 5 years ago

Yes, in fact there is a solution, largely thanks to @mdsumner via silicate. osmdata now has an osmdata_sc() function, with an interface programmed directly through dodgr as dodgr_streetnet_sc(). You don't need to worry about the results of that, but it will plug straight into weight_streetnet, and do the whole with much less memory usage than the sf way.

In your case, you can't use dodgr_streetnet_sc(), because that directly calls the overpass server through osmdata, but you can just replace your function above with

read_osm <- function (file) {
   osmdata::osmdata_sc (doc = file)
}

Let me know how you go.

mkvasnicka commented 5 years ago

The _sc versions are indeed much faster and much better conserve memory. However, I don't understand what's going on inside. I used this code:

library(osmdata)
library(dodgr)
library(dplyr)

read_osm <- function(file) {
    osm <- osmdata::osmdata_sf(doc = file) %>%
        osmdata::osm_poly2line()
    osm$osm_lines
}

prague <- read_osm("~/work/maps/prague-filtered.osm")
q <- opq(c(13, 48, 15, 51))
prague_sc <- osmdata_sc(q, doc = "~/work/maps/prague-filtered.osm")

graph <- weight_streetnet(prague, type_col = "highway", wt_profile = "motorcar") %>%
    filter(component == 1)
graph_sc <- weight_streetnet(prague_sc, wt_profile = "motorcar")

The resulting graphs have a different number of rows (both before and after graph contraction). Moreover, I can't see components in graph_sc (even though there are seven components in the "normal" graph). What's going on? Are the networks the same? Wouldn't some points get stuck in a small disconnected loop in the _sc version?

And one more thing. Is there any conversion to simple features and back? I have written a sf function that adds new vertices to lines such that the new vertex is the nearest point to some GPS location. Could a similar functionality be implemented in SC? How? (I want to match my navigation points as close to the real road network as possible. Matching to existing vertices may be problematic e.g. on highways where there are long straight segments.)

Many thank for your help (and apology that this is more an offtopic than dodgr issue). And thanks once more for dodgr.

mpadge commented 5 years ago

SC is complex and highly experimental, so there's no easy answer to your question of "What's going on?". The different numbers of rows are because those graphs enable accurate estimate of routing times as well as distances, and include effects of delays for turning against oncoming traffic. This in turn requires new edges in the graph to represent the compound effects of different turn angles. In short: SC networks processed through dodgr will not be the same as equivalent sf networks, and will generally not give identical results. If in doubt, trust the SC ones, because the routing with those is more detailed and precise.

If there are no components, you can simply do dodgr_components to get them back again.

As for sf, you should still be able to use dodgr_to_sf() on the SC-generated network to derive sf equivelents. Vertex matching should just work regardless - just use match_points_to_graph().