nkde.mc output values too low

armitakar commented 7 months ago

Hi! I have been running network kernel density estimation for the ped bike crashes in Columbus, OH using a lixel size of 200m. The output probability density estimations are very low almost close to zero and have the following inconsistencies:

As the paper suggests, the output indicates the probability of crashes per linear unit, in other words, the probability of crashes per meter. These probability estimations for all lixels do not add up to 1.
I also estimated the count of crashes on a lixel = estimated density x total crashes in the study area. The sum of crashes for all lixels I get is very low than the total number of crashes.

Here is the code I have been using:

densities <- nkde.mc(network, events = pedbike1, w = rep(1,nrow(pedbike1)), samples = samples, kernel_name = "quartic", bw = 300, div= "bw", adaptive = TRUE, trim_bw = 800, method = "simple", digits = 1, tol = 1, diggle_correction = FALSE, grid_shape = c(3,3), max_depth = 10, agg = 10, #we aggregate events within a 5m radius (faster calculation) sparse = TRUE, verbose = TRUE)

Please let me know how to fix or properly interpret these estimations. I appreciate your feedback.

Best regards, Armita

JeremyGelb commented 7 months ago

Hello !

There is a little misunderstanding. The KDE and NKDE does not produce probabilities but densities (sometimes called intensities). Each accident in your dataset has a weight of 1 and this weight is "melted" over the network. The integral of the density values sums up to one, but the values estimated at the sampling points are densities, not probabilities. Densities tend to be very small values.

As stated in the article in the RJournal : it "estimates the density over a linear unit”. You could multiply it by 1000 to have the density estimated by km. Also, the scale of the intensity is not that much of interest for this type of method because we tend to compare densities to identify hot spots and cold spots.

You could also calculate the integral on the intensity over a complete lixel, but it would require a lot of calculation time. This value could be interpreted as as number of event over a linear unit.

armitakar commented 7 months ago

Hi Jeremy, Thanks so much for the clarification. It helps a lot. I have some follow-up questions.

First, is there any quick code to check if the integral of the density values sums up to one? Because I have performed a summation of density values estimated for all lixels and that does not add up to 1. Please see the code:

sum(crash$yngpd) [1] 0.009457152

Second, I followed the R journal code on page 569. "To obtain more readable results, one can multiply the obtained densities by the total number of accidents (to make the spatial integral equal to the number of events)." - Based on my understanding, the following code should return the total number of events, which is 416. But, I am getting way less than that.

sum(crash$yngpd *416) [1] 3.934175

Lastly, my goal with this analysis is to estimate crash risks on different road segments. Can the intensity value (crash density per kilometer) be defined as crash risk (high intensity indicating high crash risks)? Or do you have any better suggestions for such analysis?

Thanks so much for your help.

Best regards, Armita

JeremyGelb commented 7 months ago

Hi armitakar,

I am not sure to understand your first question. It would be very difficult to calculate the intgral of all your observations on the network. I have some functions in the unity tests of the package that check test_border_correction_sf.R that ensure that the intgral is valid when I apply border corrections.

Also note that adding the estimated densities at sampling point is not the same thing as calculating the integral over the network. To calculate the integral, we would need to sum up the densities at every possible point (an infinity of locations) along the lines of the network. The best approach would be to estimate the densities at several points along the lines and then to estimate the integral from these sampling points.

Multiplying the density values by a constant will just make it easier to map but will not change the results.

Crash density by kilometer can not itself be considered as a measure of crash risk because the number of crash is influenced by the total trafic. For example, on a bicycle infrastructure with a lot of traffic, you will observe more accidents and a higher density. It would be an error to interpret it as a more dangerous infrastructure. The best thing to do here would be to calculate the ratio between you accident density and the volume of cyclists passing by each line of the network. Also, If you can estimate the density of cyclists over the network (with counts at intersections for example), you can calculate the ratio of the two densities and it can bee understand as a measure of relative risk.

armitakar commented 7 months ago

Hi Jeremy, Thank you so much for the explanation. It was very helpful.

Best regards, Armita

JeremyGelb / spNetwork

nkde.mc output values too low #21