JeremyGelb / spNetwork

An R package to perform spatial analysis on networks.
GNU General Public License v2.0
34 stars 2 forks source link

The Result of the cv_scores being Negative in the Cross-Validation Calculation to Determine the Bandwidth #22

Closed citrahadisaputri closed 2 weeks ago

citrahadisaputri commented 2 months ago

This is my first time trying out this package. Thank you for creating such an amazing package. However, I have encountered some difficulties during operation. I want to check the bandwidth based on the data I have to determine the most optimal bandwidth using cross-validation likelihood. However, the cv_scores or the score calculation results for the bandwidth are all negative. I have tried various values from the smallest to the largest (bw (200, 500, 2000, 100.000, ect)).

My research aims to determine the density of traffic accident occurrences from 2021 to 2023 based on primary roads. Before I perform cross-validation calculations, I need to simplify the road data due to an error indicating that my data is too complex. Here's the code I'm using for this operation: "simple_lines_valid" is the road data after simplification. I'm using a grid (200,200) because of the large research area, which is approximately 373.78 km²."

image

I hope to find a solution to this issue. If I've made any mistakes, I apologize, and I'm open to feedback. Thank you.

JeremyGelb commented 1 month ago

Hello ! Thank you for your interest in spNetwork !

It is absolutely normal to obtain negative scores for bandwidth cross validation. If you look at the formula, it is simply the sum of the log of the estimated bandwidth at event location when the event is removed (leave one out). The log of a small value can be a negative number. It does not change how the results is interpreted. Please, check the vignette called introduction. The example provided also obtain negative values.

Using grid_shape = c(200,200) seems too much, but I cannot really tell without seeing the study area. When I work with data for a city of the size of Montreal, 15*15 is usually enough. Also, the parameter max_depth = 15 is likely very high, you could greatly reduce calculation time with only a small reduction in precision with max_depth = 10.

Finally, I just uploaded a new version of the package on github. The new version is much faster than the previous one. I would recommend to use it. However, the parameters of the bandwidth selection functions have been slightly modified.

Let me know if it helps !

citrahadisaputri commented 1 month ago

Thank you, it was very helpful. I will follow some of your suggestions. Thank you for the new information you provided! Have a great day!