JeremyGelb / spNetwork

An R package to perform spatial analysis on networks.
GNU General Public License v2.0
34 stars 2 forks source link

Question: statistical summarization about the NKDE #18

Closed adhamenaya closed 5 months ago

adhamenaya commented 8 months ago

Hi, Can I use this package to generate statistical summarization about the NKDE like the Central Tendency and Dispersion Measures?

JeremyGelb commented 8 months ago

Hello ! Could you please give me more details about your question ? The NKDE is calculated at sampling points along the network. Each sampling point has a specific density value. Are you looking for statistical summarization along the lines of the network ? For the whole network ? Or are you looking for uncertaintiy of the NKDE estimated at each sampling point ?

adhamenaya commented 8 months ago

@JeremyGelb Yes, I am looking for summarization along the whole network. I tried to calculate the central tendency point by calculating the weighted mean of the lines in the entire network, like the following code:

nkde_values <- samples$density # access using weighted mean
weighted_mean_x <- weighted.mean(points[,1], nkde_values)
weighted_mean_y <- weighted.mean(points[,2], nkde_values)
print(paste("Weighted mean (x):", weighted_mean_x))
print(paste("Weighted mean (y):", weighted_mean_y))

where the nkde_values contain the estimated NKDE for each point/line. Do you think this is a valid approach?

JeremyGelb commented 8 months ago

@adhamenaya, it is still unclear for me what you try to obtain. You are calculating the weigted mean of the coordinates of the sampling points based on the estimated densities ?

There is a difference between the sampling points and the events. The events are the locations of real data occuring on your network. Sampling points are arbitrary locations along the network where we estimate the densities of the events (based on kernel functions that "melt" the density of the events).

What is the question you are trying to answer with your analysis ? Are you trying to measure the clustering of your events ? Are you looking for the center of your events ?

If you are interested by the center of your events, note that classical methods of point pattern analysis do not work well on a network. The mean center of the events can not calculated because we are not in an euclidean space. However, you could find the point on the network that minimize the distance to all the events for example.

adhamenaya commented 8 months ago

Thank you for your detailed response. The questions that I'm trying to investigate is to calculate the the distance between two different NKDE, I was thinking if the aggregate summary like central tendancy could he helpful to understand tgr distance/dissimilarity between two different distractions. Otherwise, what could suggest to use to explore the distances between two NKDE, or KDE in general?

JeremyGelb commented 8 months ago

I am not sure to understand what would be the distance between two NKDE. I guess that you are interested into the difference in the spatial patern of two sets of events on the same network.

If you have two sets of events, you coud consider simply calculating the difference of the two NKDE and map it. You just need to ensure that the sampling points are the same for both NKDE. If the number of events is very different between the two sets, you could scale the NDE first to have a more meaningfull comparison.

adhamenaya commented 8 months ago

Thank you very much, actually my comparison is between two different networks. But yes I am interested in finding the differences in the spatial pattern of the same type of event, I have two different events dataset on two different networks. Do you think my question is flawed?

JeremyGelb commented 8 months ago

I understand a little bit better your problem now.

You are interested in the differences between similar type of events but on two different networks.

The NKDE could be used in combinaison with other spatial methods like the global Moran I to see how difference in spatial autocorrelation / clustering for both networks.

If you work directly with your events instead of the NDKE, you could also use metrics like the distance to the k nearest neighbours. It will give you a good idea about the spatial dispersion of your events on the two networks. In a similar fashion, you could use the G and K statistics (https://jeremygelb.github.io/spNetwork/articles/KNetworkFunctions.html)

adhamenaya commented 8 months ago

Definitely, this look interesting. To put more context, I'm trying to calculate the spatial distribution/patterns/dispersion of each different POI type, and I want to represent it in a single value.

For example: Area 1: Restaurant type: 0.19 Transport type: 0.34 Business type: 0.53

Area 2: Restaurant type: 0.23 Transport type: 0.72 Business type: 0.62 ...

In summery, I'm trying to capture the spatial dispersion of events in two network as a single value, that I will use these values to calculate the difference/dissimilarity between the Area 1, and Area 2...

Thank you so much for your work and replies. Really helpful.

JeremyGelb commented 8 months ago

Well, if you want to use only one metric to characterize the dispersion of your POI on a network, I would recommend to use the distance to a specific neighbour (like the 1st, 2nd, 3rd, etc.) and to report the median of the value among all the POI on a network.

For example, a value of 500 meters for the first neighbour would mean that 50% of your POI are located 500m away from the closests other POI. This is a nicely interpretable measure of dispersion. You could also present the values of the 5% and 90% percentiles of the distribution, this would help to compare the variation of this dispersion measure among several networks.

The only question would be to select the relevant value for the neighbour to reach (1, 2, 3, ... ?). It must be the same for all the networks if you want to compare the obtained results. You could try several values and see which one gives the most pertinent results.

JeremyGelb commented 5 months ago

closed because of long time without activity