TorchSpatiotemporal / tsl

tsl: a PyTorch library for processing spatiotemporal data.
https://torch-spatiotemporal.readthedocs.io/
MIT License
236 stars 22 forks source link

Is the definition of connectivity in the AirQuality dataset wrong? #29

Closed javiersgjavi closed 9 months ago

javiersgjavi commented 9 months ago

As far as I understand reading Table 5 of Appendix B from the original GRIN article, this dataset has the structure of an undirected graph with 2699 edges.

However, I have executed the following code in a Jupyter Notebook:

from tsl.datasets import AirQuality

x = AirQuality()
index, weight = x.get_connectivity()
print(index.shape, weight.shape)

And I saw the this output: (2, 66661) (66661,)

I don't understand well how this shape is possible in the edge index. First I thought that maybe you have already provided an implementation of this dataset as a directed graph, but shouldn't the shape be as much as 2*undirected_edges?

marshka commented 9 months ago

The get_connectivity() method is inherited from the tsl.datasets.prototypes.Dataset class and is not overridden in any of the datasets in tsl. We chose to do so in order to have the flexibility of using different adjacency matrices than those in the literature.

Under the hood, this function calls get_similarity(), which returns an N-by-N matrix with pairwise similarity scores (the higher, the more similar). Once this affinity matrix is obtained, it is post-processed and returned as a (weighted) adjacency matrix. If no parameters are specified when calling get_connectivity(), basically no post-processing is done.

For AirQuality, the get_similarity() function returns the distance matrix after the Gaussian kernel: diagonal is all 1, and other values get lower with the distance of the two sensors. To get the same adjacent matrix of previous works you should:

Here is the demo:

from tsl.datasets import AirQuality

x = AirQuality()
index, weight = x.get_connectivity(include_self=False, threshold=0.1)
print(index.shape, weight.shape)

>>> (2, 5398) (5398,)

Which is exactly 2699 * 2 (the adjacency matrix is undirected).

javiersgjavi commented 9 months ago

Thank you very much for the answer! This clears up all my doubts