Is the definition of connectivity in the AirQuality dataset wrong?

javiersgjavi commented 1 year ago

As far as I understand reading Table 5 of Appendix B from the original GRIN article, this dataset has the structure of an undirected graph with 2699 edges.

However, I have executed the following code in a Jupyter Notebook:

from tsl.datasets import AirQuality

x = AirQuality()
index, weight = x.get_connectivity()
print(index.shape, weight.shape)

And I saw the this output: (2, 66661) (66661,)

I don't understand well how this shape is possible in the edge index. First I thought that maybe you have already provided an implementation of this dataset as a directed graph, but shouldn't the shape be as much as 2*undirected_edges?

marshka commented 1 year ago

The get_connectivity() method is inherited from the tsl.datasets.prototypes.Dataset class and is not overridden in any of the datasets in tsl. We chose to do so in order to have the flexibility of using different adjacency matrices than those in the literature.

Under the hood, this function calls get_similarity(), which returns an N-by-N matrix with pairwise similarity scores (the higher, the more similar). Once this affinity matrix is obtained, it is post-processed and returned as a (weighted) adjacency matrix. If no parameters are specified when calling get_connectivity(), basically no post-processing is done.

For AirQuality, the get_similarity() function returns the distance matrix after the Gaussian kernel: diagonal is all 1, and other values get lower with the distance of the two sensors. To get the same adjacent matrix of previous works you should:

threshold values under 0.1, with argument threshold=0.1
remove self-loops, with argument include_self=False

Here is the demo:

from tsl.datasets import AirQuality

x = AirQuality()
index, weight = x.get_connectivity(include_self=False, threshold=0.1)
print(index.shape, weight.shape)

>>> (2, 5398) (5398,)

Which is exactly 2699 * 2 (the adjacency matrix is undirected).

javiersgjavi commented 1 year ago

Thank you very much for the answer! This clears up all my doubts

TorchSpatiotemporal / tsl

Is the definition of connectivity in the AirQuality dataset wrong? #29