Closed bright1993ff66 closed 1 year ago
Hello there ! First of all, thank you for your interest in spNetwork. I would be happy to help you with this bug. I think it could be some kind of edge case specific to your dataset. But you could check the following first :
If it is not helping, could you send me the data required to reproduce the bug (one of the shapefiles with the points, the network and the border) ?
Hello there ! First of all, thank you for your interest in spNetwork. I would be happy to help you with this bug. I think it could be some kind of edge case specific to your dataset. But you could check the following first :
- Check which shapefile is causing the error (it will help to reproduce the error)
- Have you checked that the events are not too far from the road network ?
- samples_on_roads are point geometry ?
If it is not helping, could you send me the data required to reproduce the bug (one of the shapefiles with the points, the network and the border) ?
Dear @JeremyGelb ,
Thank you for your response! I double check the questions you mentioned:
lixelize_lines.mc
in the spNetwork
package as follows:
lixels <- lixelize_lines.mc(ny_roads_select, lx_length = 200,
verbose=TRUE, mindist = 100)
samples <- lines_center(lixels)
st_write(samples, 'lixel_samples.shp', append=FALSE)
st_write(lixels, 'lixels.shp', append=FALSE)
Here, I uploaded the shapefiles to reproduce the error message. The files were named according to the codes. Thank you for your time and support!
@JeremyGelb Sorry for not uploading the border shapefile. Here is the shapefile for border of New York City, with project coordinate system epsg = 32118
Great! I will have a look this week.
I am not sure if I have worked this out...
I think the problem is about the input. The many of road segments I used previously do not have any accident records nearby, which cause significant zero-inflation problem. Hence, I removed the road segments where no collision record was found in nearby 5-meter zone (done by ArcGIS select by location
function). About 20% of road segments were selected for the following network kernel density calculation. Then I computed the network kernel density using the following code:
adapt_densities <- nkde.mc(
lines = ny_roads_select,
events = accs_select,
w = rep(1, nrow(accs_select)),
samples = samples_on_roads,
kernel_name = "gaussian",
bw = 500, # set a reference bandwidth
adaptive = TRUE,
trim_bw = 1000, # set the maximum local value of bandwidth
div= "bw",
method = "discontinuous",
digits = 2,
tol = 1,
grid_shape = c(2,2), # for fast computation
max_depth = 8,
agg = 100, #we aggregate events within a 100m radius (faster calculation)
sparse = TRUE,
verbose = TRUE,
check = TRUE)
Here I attached the shapefiles I used to run the above codes.
Hello ! It is good to know that the error is certainly caused by the data used. However, removing many roads can cause a problem. The densities of the events are spread along the roads. If one is missing, the local densities will be overstimated (before the missing link) and underestimated (after the missing link). I would suggest two things here :
I am working on it, but i have found some elements so far :
The second point could not be the cause of the problem, but it is the first time I see this configuration. I think that the best practice here would be to edit the polygons to have only one big polygon (by drawing bridges in Qgis) or to split the analysis for each island
The last point is the most concerning. It produces a case where the density of a point cross the border but remains in the study area. I suggest here to union the geometries in the same island :
border <- st_union(st_combine(border), by_feature = TRUE)
I am still working on it, but I am 98% sure that it is the cause of the problem
Dear @JeremyGelb ,
Thank you for your suggestions! I will check the three points you mentioned so far and report soon~
I am working on it, but i have found some elements so far :
- some events are outside the limits of the study_area
- the study area has distinct islands
- some islands have internal borders
The second point could not be the cause of the problem, but it is the first time I see this configuration. I think that the best practice here would be to edit the polygons to have only one big polygon (by drawing bridges in Qgis) or to split the analysis for each island
The last point is the most concerning. It produces a case where the density of a point cross the border but remains in the study area. I suggest here to union the geometries in the same island :
border <- st_union(st_combine(border), by_feature = TRUE)
I am still working on it, but I am 98% sure that it is the cause of the problem
Dear @JeremyGelb ,
Thank you for your suggestions! I will check the three points you mentioned so far and report soon~
Hello ! It is good to know that the error is certainly caused by the data used. However, removing many roads can cause a problem. The densities of the events are spread along the roads. If one is missing, the local densities will be overstimated (before the missing link) and underestimated (after the missing link). I would suggest two things here :
- checking if all the roads have a valid geometry (st_is_valid, st_make_valid), removing the roads with a length equal to 0.
- Changing the parameter grid_shape with other values (like c(10,10) or c(9,9)). It is not impossible that the splitting operation has created an edge case.
@JeremyGelb Thank you for your reply! But how to cope with this kind of zero-inflation problem in kernel density estimation? The collision records are sparsely distributed in space, and the majority of roads do not have collision records nearby. This can result in a density estimate that is biased towards the areas with accidents, and the areas with no accidents are not represented accurately.
Do you have any suggestions about dealing with the zero-inflation problem in this case? Thanks!
Hello !
I have been able to run the analysis with the following code on the data you send me. My guess about the border was right.
library(sf)
library(spNetwork)
roads <- st_read("C:/Users/Gelb/Desktop/TEMP/spNetwork_issue15/data/ny_roads_select.shp")
events <- st_read("C:/Users/Gelb/Desktop/TEMP/spNetwork_issue15/data/acc_select.shp")
border <- st_read("C:/Users/Gelb/Desktop/TEMP/spNetwork_issue15/data/ny_border.shp")
border <- st_buffer(border, 5)
border <- st_union(st_combine(border), by_feature = TRUE)
sample_lines <- st_read("C:/Users/Gelb/Desktop/TEMP/spNetwork_issue15/data/lixels.shp")
sample_points <- lines_center(sample_lines)
# let me remove points ouside the border
test <- lengths(st_intersects(events,border))>0
events <- subset(events, test)
events$w <- 1
# length check
sum(as.numeric(st_length(roads)) == 0)
# OK
# validity test
sum(!st_is_valid(roads))
# OK
# let me calculate the density first with an abritrary BW
results1 <- nkde(
lines = roads,
events = events,
w = events$w ,
samples = sample_points,
kernel_name = "gaussian",
bw = 500,
trim_bw = 1200,
adaptive = TRUE,
method = "discontinuous",
diggle_correction = TRUE,
study_area = border,
digits = 2,
tol = 1,
grid_shape = c(8,8), # for fast computation
max_depth = 8,
agg = 50, #we aggregate events within a 100m radius (faster calculation)
sparse = TRUE,
verbose = TRUE,
check = FALSE,
div = "bw"
)
sample_points$dens <- results1$k * 1000
library(tmap)
tm_shape(sample_points) +
tm_dots(col = "dens", style = "kmeans", n = 8)
Here is a little map of the results
About 0 inflation, I think that it is not a problem for the NKDE. Kernel density methods are only descriptive methods and not modeling methods. The presence or the absence of links without events is not impacting the calculus for the other links. However, if you plan to use the obtained densities in a regression model (to understand the impact of some predictors on the densities) then you might have a 0 inflation problem.
@JeremyGelb Thank you for your codes, and also, much thanks for your feedback about the zero-inflation problem!
I will revise my codes later. Thanks again for your time and contribution to this open question. I will close this question then~
Great ! Thank you again for your interest, let me know if you encounter another problem later.
Thank the author for creating such a convenient package for spatial analysis!
I am currently working on analyzing the spatial distribution of accidents in the road network. I want to use the network KDE provided by the
nkde.mc
function in the spNetwork to complete this task. The codes I used are presented as follows:But I got the following error continuously in the console when verbose code is "calculating the local bandwidth...":
Can anyone give me some hints about this error message? It seems like it was due to a dimension dismatch problem, as indicated by some questions in stackoverflow. But I cannot figure it out....