JeremyGelb / spNetwork

An R package to perform spatial analysis on networks.
GNU General Public License v2.0
34 stars 2 forks source link

Problem with calculating isochrones for roads crossing at different levels. #7

Closed wacekk closed 1 year ago

wacekk commented 1 year ago

First of all, thank you for developing such a great package!

I have a problem with the calc_isochrones algorithm. Is it possible to tell the algortim that it is not possible to change the road at every road crossing point - e.g. a road crosses another road on an overpass? This causes distortions in the calculations, e.g. in the vicinity of expressways that do not have a connection with the surrounding area. I have a road level marking in my data (e.g. -1 for a tunnel, 0 for a road on the ground, 1 for an overpass). In this case, only for roads with the same elevation coefficient there would be a link.

Is it possible or is it possible to introduce such a possibility?

JeremyGelb commented 1 year ago

Hello There! First of all, thank you for your interest in spNetwork!

This is a difficult one. Currently, the graph is built by using the topology of the network. If two edges share an end or start node, they will be connected. If the network has a good topology, then a bridge should not share an end node or a start node with the links passing under it. In other words, it is possible to go from a line to another only if the lines intersects at their extremities. Have you observed another behavior of the algorithm? If so, could you provide a reproducible example? In that case I would correct the bug as soon as possible.

All the best.

wacekk commented 1 year ago

Thank you for the quick reply.

You are absolutely right. I used the Polish data of the Head Office of Geodesy and Cartography (GUGiK) for the area of ​​Warsaw. You can download them into R using rgugik::topodb_download() After reading the description of individual layers: https://isap.sejm.gov.pl/isap.nsf/download.xsp/WDU20210001412/O/D20211412.pdf road layer (SKDR) has the description "in the place of single-level and multi-level road intersections, segmentation of all road sections leading to the intersection is made", and the layer with roadways (SKJZ) "in the place of collision-free multi-level road intersections, road sections are not segmented". As I used to analyze the layer with roadways, the isochrones analysis performed correctly (but slower, due to more features in the layer). I also saw the OSM layer of roads - it is built similarly to the SKJZ layer.

As a proposal for the development of the package in the future, I suggest adding an additional parameter - the name of the attribute, so that when searching for network nodes, the algorithm takes into account the equality of the selected attribute (e.g. road position level). Such a parameter is present both in OSM data and in GUGiK data and this solution may be useful during analyzes.

I hope to use other functions from the spNetwork package soon - in particular to analyze the distribution of road crashes.

I wish you all the best!

JeremyGelb commented 1 year ago

So everything is fine! I could add a new feature to have multilayered network, but we have to think about how the user will have to specify it. Maybe, in the layer of the lines, two fields could indicate the elevation of the start and the end node of each line. What do you think about this solution?

wacekk commented 1 year ago

From what I was looking at, the level of the route given by me is defined as 0 (on the surface), -1 (underground) or 1, 2 for the next levels of the viaducts. I think treating it as elevation would have the same effect (assuming it could be negative - 0 is greater then -1). Please take a look at the OSM data from Geofabrik or the data from Poland shown by me (downloadable with the rgugik package, SKJZ and SKDR layers).

Thank you for your interest in the problem I described.

JeremyGelb commented 1 year ago

I am puzzlde by the fact that an index can be used for the full line. If a part of the lines of a network have the index -1, they will never connect with lines at another level. We need a way to "jump" from one level to another at some entry points. This is why I suggested to have a level at the start and end points of each line.

Another option would be to allow to jump from one level to another at intersections if there is only a difference of 1 between the two levels. So when on a road with index 1, one could reach roads at levels 0,1 and 2. In other words, 0 could represent ground, -1 a connexion line before a tunnel, -2 the tunnel level. One can not jump from 0 to -2 without a connexion line at the level -1.

wacekk commented 1 year ago

I'm sorry for the lack of response. I was on a holiday.

You're right, it is indeed complicated. I guess you need to use networks that have the right topology.

However, I have 2 other issues while using the package. I carry out analyzes for Warsaw, it is the capital city of Poland, with approx. 2 million inhabitants. The road network at the input to the algorithm contains approx. 65,000 features. After the analysis, I get about 575,000 features.

The calc_isochrones() algorithm determines de facto separate road networks for each of the ranges (e.g. 200, 400, 600 m ...). The individual features overlap. This is a problem when visualizing with other software (eg QGIS). Is it possible to obtain one net, where i.e. under the section with the attribute of 200 m, there are no sections with the attribute of 400 m, 600 m, etc.? I tried iteratively using st_difference, but with this number of objects my computer doesn't do it. :(

The second problem is the use of OSM data from http://download.geofabrik.de/europe/poland.html. In this data (file: gis_osm_roads_free_1.shp), the road network often lacks end of road sections at intersections. Is there any method to efficiently prepare such data for analysis? You cannot automatically divide the sections at each intersection, because then there shouldn't be conjuction on viaducts (as I wrote about earlier). I think that solving this problem will be valuable, because the data from OSM are worldwide and contain information about the direction of a given road.

In the figures below, the marked road is one feature and, as you can see, it has several intersections. Although there are nodes at this point, the algorithm does not see the road connection because it is not the end or beginning of the object.

Both map data: © autorzy OpenStreetMap

JeremyGelb commented 1 year ago

Thank you for the feed back and no problem about the delay.

Concerning your first demand, I implemented this weekend a new parameter in the function calc_isochrones : donught. It should cut the lines to avoid the overlap. It is available in the development version on github if you want to try it. I tested it a bit with a toy dataset, but your comments would be appreciated.

Concerning the second point, I have a method I would suggest with your osm data:

It should do the trick in the case you described.

I hope it will help !

wacekk commented 1 year ago

Thank you for the quick reply!

The new donught parameter is exactly what I mean. Thank you very much! :) I tested it and it works fairly well. Below is a net with isochrones exported from R and loaded into QGIS (for further analysis).

Besides, the .gpkg file is more than seven times smaller.

However, there are cases where the new algorithm removes network fragments with isochrones. This happens after the calculations are made. Below is an example calculated using the package in the CRAN version:

In the following, the network calculated with the new algorithm with donught = TRUE. You can see the gray section - this is the complete network rendered in grey under the layer with isochroned network. The segment is missing here as a result of the calc_isochrones algorithm. The problem occurs after performing the calculations, because you can see that the cut-off part of the network has a properly calculated distance attribute (properly colors).

Both pictures are png rendered with the tmap package.

I will try to develop a reprex for this problem but I don't promise I will succeed. The problem occurs in quite a few places, but throughout the city. Both versions were computed from the same input files. The whole analyzed network is very large, so the calculations take quite a long time.

I was able to analyze the data from OSM, although the calculations took several hours. I implemented the simple_lines algorithm only for roads on the '0' level. At other levels, connections are few and the vast majority have the correct topology. When I watch them carefully, I will let you know what the result is. Maybe using simplify_network aftersimple_lines would save computation time.

Thank you so much for your great help!

JeremyGelb commented 1 year ago

It is a pleasure !

Could you send a me a small shp file of the area you are showing above ? It could help me to understand why some parts of the lines are missing.

wacekk commented 1 year ago

I have prepared a reprex. I managed to get this error again (and by the way, another one, how I wanted to analyze a small fragment of the network - described in the code comment). Sorry for the variable names in Polish, but I used parts of my earlier code. The area with isochrones is in the eastern part of the city.

library(tidyverse)
library(sf)
library(tmap)
library(RColorBrewer)
library(spNetwork)
library(rgugik)

topodb_download(TERYT = 1465) # approx. 76 MB of data from GUGiK (Head Office of Geodesy and Cartography), run only first time
# 1465 is a code of Warsaw, the capital city of Poland

jezdnie_warszawa <- read_sf("./PL.PZGiK.330.1465/BDOT10k/PL.PZGiK.330.1465__OT_SKJZ_L.xml")
# SKJZ is an abbreviation of transport net - carriageways (in Polish: Sieć komunikacyjna jezdnie)

st_geometry(jezdnie_warszawa) <- "geometry"

punkt_startowy <- tibble("OID" = 1, 
                          "x" = 652437.141,
                          "y" = 489907.350)

punkt_startowy <- st_as_sf(punkt_startowy, coords = c("x","y"), crs = 2180)

boundary <- st_bbox(c(xmin = 651505.4, ymin = 489798.9, xmax = 653724.5, ymax = 490956.9), crs = 2180) %>% 
  st_as_sfc()

# It's a problem - the `calc_isochrones` function won't run on the network segment bounded by the `boundary` polygon.
# I wanted to speed up the analysis by reducing the area. 
# I get the error 'Error in eval (bysub, parent.frame (), parent.frame ()):
# object 'L1' not found`

# Therefore, I omitted the code snippet below and used the whole city network.

# jezdnie_analiza <- jezdnie_warszawa %>% 
#   st_filter(boundary, .predicate = st_intersects)

jezdnie_analiza <- jezdnie_warszawa # network for whole city, because mentioned above error

jezdnie_analiza <- jezdnie_analiza %>% 
  mutate(dlugosc = as.numeric(st_length(geometry))) %>% 
  select(dlugosc)

iso_results <- calc_isochrones(lines = jezdnie_analiza,
                               start_points = punkt_startowy,
                               dists = c(200, 400, 600, 800, 1000, 1300, 1600, 2000, 2500, 3000),
                               weight = "dlugosc",
                               donught = TRUE
)

iso_results$fac_dist <- as.factor(iso_results$distance)
iso_results <- iso_results[order(-1*iso_results$distance),]

tm2 <- tm_shape(jezdnie_warszawa) + 
  tm_lines(col = "gray50") +
  tm_shape(iso_results) + 
  tm_lines(col = "fac_dist",title.col = "Odległość [m]",
           palette = rev(brewer.pal(n=10,"RdYlGn")))+
  tm_layout(legend.outside = TRUE)+
  tm_scale_bar()

tmap_save(tm2, "iso_bug1.png", width = 5000, height = 4000, dpi = 300)
#tmap_save(tm2, "iso_bug1.png", width = 800, height = 500, dpi = 300) # for small area
wacekk commented 1 year ago

Other minor issues noticed:

  1. When by mistake I entered the coordinates of the starting point for the calculations in the wrong CRS (so it did not adhere to the network at all), I got the error "Error in coordsLines [[nearest_line_index [x]]]: attempt to select less than one element in get1index "- perhaps it is worth informing the user what the problem is in this case

  2. The problem with "gray lines" (gaps in isochrones) also occurs in other cities for which I have made analyzes.

JeremyGelb commented 1 year ago

Thank you for your comments, I will try to check this as soon as possible !

JeremyGelb commented 1 year ago

I tried to use the code you provided but I can't download the data with the function topodb_download, status was 'Couldn't resolve host name'Conection error.

Could you send me the data directly ? I will try with another dataset this weekend.

wacekk commented 1 year ago

I just checked. Data is downloading on my computer without any problem. Try to download the file directly from the link: https://opendata.geoportal.gov.pl/bdot10k/14/1465_GML.zip Then you need to unpack the file: PL.PZGiK.330.1465__OT_SKJZ_L.xml I can also send a layer with carriageways (approx. 8 MB ) on email. In that case, please provide the address.

JeremyGelb commented 1 year ago

I have been able to reproduce the bug with a small sample of other data. It is not an easy problem and I am working on it. I will keep you updated when the problem is solved.

wacekk commented 1 year ago

Dear Jeremy!

I found one more point. Apart from the situation when there are breaks in road sections, there are places where sections of different distances overlap. This cannot be seen when a static map is made with tmap, but after exporting to .gpkg and using it in QGIS it is visible - print screens below:

This creates a problem for example when calculating the statistics of the percentage of roads in a given distance from the starting point. Do you know how long it will take to improve the package?

Thank you in advance for your help!

JeremyGelb commented 1 year ago

Hello ! Thank you again for sharing the bug you found. Definitively, the feature donugth is not mature for real case uses. I do not have much time currently to work on spNetwork. I really cannot tell when I will have the time to do the corrections. Sorry for the delay.

wacekk commented 1 year ago

Thank you Jeremy!

I went back to the version of the package available in CRAN.

However, I have another problem. I do a before-and-after analysis of a certain road network. After adding the designed roads to my network, I get the error: Error in igraph :: distances (graph_result $ graph, to = row $ node_idx, v = graph_result $ spvertices $ id,: At core / paths / dijkstra.c: 124: Weight vector must not contain NaN values, Invalid value

Earlier, this error appeared to me when I did not have a variable weight defined for some segment of the network. But in this case the weight of each segment is finite (checked with the is.finite function). I added roads in QGIS, using topological editing and checking if the segments had common nodes. What could be the problem? Or at least how to find which feature is affected by the problem?

JeremyGelb commented 1 year ago

Hello !

This is a weird case. Are you sure that all your geometry are valid (st_is_valid) and that your geometries have a length above 0 (st_length). If you have very small lines, the parameter digits could cause a problem. For example, if you have an edge of 5cm and digit = 1, the truncation of the coordinates will reduce the length of the line to 0. If you have trouble to find the cause of the bug, feel free to share your network once again, I will have a quick check.

wacekk commented 1 year ago

Hello,

Thanks for the hint. I checked if the geometry was not empty st_is_empty, checked that it was simple st_is_simple, checked that none of the weights were NA (which I had a problem with before connecting networks from different sources), but I did not check that all objects are valid st_is_valid. After filtering out an object that was not valid, the algorithm works correctly! I would like to add that the segment weight equal to 0 is not a problem in the operation of the algorithm. If you want to define objects within a range of e.g. 10 km by bike with the possibility of traveling by train (e.g. weekend trip planning), you can set the railway network with a weight of 0 and the function will calculate the distance traveled on a bike well, and will not take the train into account.

I know that you do not have time to develop spNetwork, but in my free time I will send you some ideas for new, interesting functions.

JeremyGelb commented 1 year ago

I am glad that you could solve this problem! Is is also good to know that the parameter weight has this behaviour. But I would suggest to filter the lines with a good old subset instead.

I hope I will have more time to work on spNetwork in a close future. It is not a dead project and ideas are welcome!

I could not work on the feature donught, but I have a workaround to propose for the moment. If you define buffers with FLAT borders, you could use the functions st_union and st_difference of the package sf. The results will be valid as long as a bridge or a tunnel is not overlapping with other lines.

Here is an example :

library(spNetwork)
library(tmap)
library(sf)

data("mtl_network")
data("bike_accidents")

# loading a small dataset for the example
st_geometry(bike_accidents) <- "geom"
acc1 <- bike_accidents[300,]

# A quick map of the situation
tm_shape(mtl_network) +
  tm_lines(col = "black") +
  tm_shape(acc1) +
  tm_dots(col = "red", size = 0.5)

# Calculating isochrones (old-school)
isos2 <- calc_isochrones(lines = mtl_network,
                         dists = c(150, 500, 888),
                         start_points = acc1,
                         donught = FALSE
)

# Starting to cut the larger isochrones within a loop
dists <- c(500, 888)

list_of_cut_lines <- lapply(dists, function(high_lim){
  # isolating the lines that must be used as a mask
  base_lines <- subset(isos2, isos2$distance < high_lim)
  # isolating the lines that must be cut by the mask
  to_cut <- subset(isos2, isos2$distance == high_lim)
  # creating a small buffers and be sure that their ends are FLAT
  buffs <- st_buffer(base_lines, dist = 1, endCapStyle = "FLAT", joinStyle = "BEVEL")
  # merging all the buffers in a big mask
  poly <- st_union(buffs, by_feature = FALSE)
  # cutting the lines with the mask
  new_lines <- st_difference(to_cut, poly)
  return(new_lines)
})

# merging all the lines in one layer
new_iso <- rbind(
  subset(isos2, isos2$distance == 150),
  do.call(rbind, list_of_cut_lines)
)

I you have more than one starting point, you could adjust the code by adding another loop for each starting point. If this solution could help you, I will provide another chunk of code for three starting points.

wacekk commented 1 year ago

Thank you for your response. Unfortunately, I have many bridges and tunnels in my network. I work on a network for a city with a population of two million.

And did you manage to reproduce the error from my example from 08/17/2022? The error occurred after spatially filtering out by st_filter of a fragment of the network.

Error in eval (bysub, parent.frame (), parent.frame ()):
 object 'L1' not found

The analysis for the entire city and the rendering of the map takes over an hour on my computer. If I could do this in fragments, it would be easier for me to test different functions and options.

JeremyGelb commented 1 year ago

Hello ! I have not tried to reproduce the error of the 08/17/2022, it is on my todo list.

However, I reworked the code of the donught parameter. I just pushed a new version on github. It worked well on the case I spotted in my data. Could you check if you get the expected results on the cases you presented?

For performance issue, if you have a computer with several cores, we could try to split the work between them.

wacekk commented 1 year ago

Hello Jeremy!

I have checked the new version of the package with the donught parameter. The .png map generated by the tmap package for the spNetwork version from CRAN (without donught) and the github version (with donught) are identical. I saved the layer in .gpkg and analyzed it in QGIS. Wherever there has been a problem, everything is fine now. Good job! :)

I have never used multiple CPU cores in R. But reducing the number of features with isochrones speeds up the analysis. Cropping isochrones to the city border (in the analysis I use roads from neighboring counties near the city border) and generating the map in tmap is now much faster.

The idea I wrote about earlier is what I call "isochrones algebra". Application examples: I have 2 variants of the road network and I would like to compare them. Currently, I can only visually compare the 2 maps. However, I cannot generate a network with visible differences - where has it improved and where is it worse. Another example is the analysis of networks with one-way roads. The distance to one direction is then different from the other. I can now do "from a point" analysis. Then, artificially change the direction of the one-way traffic, which gives an analysis in the direction "to the point". But I would like to calculate the maximum or sum of these distances and plot them on the network, this is not possible. This is my idea for the further development of the package.

JeremyGelb commented 1 year ago

I am happy that the feature donught is now working as expected. Concerning the suggestion about "isochrones algebra", it sounds more like isochrones comparison. We are interested into differences between isochrones, we cannot multiply them for example. The main issue is to compare efficiently two sets of linestrings without using buffers (problem caused by tunnels and bridges). If we suppose that both sets are composed of the same linestrings (sharing an ID by example), we could encode them and first removing all the perfect matches. Then we could work on differences, line by line, based on the IDs. If you are ok, I will open a new issue about it and close this one because the donught feature is now working.

wacekk commented 1 year ago

Of course. Thank you very much, I am closing the issue. Sorry to write in one issue about a few problems.