Closed javiersgjavi closed 1 year ago
The get_connectivity()
method is inherited from the tsl.datasets.prototypes.Dataset
class and is not overridden in any of the datasets in tsl. We chose to do so in order to have the flexibility of using different adjacency matrices than those in the literature.
Under the hood, this function calls get_similarity()
, which returns an N-by-N matrix with pairwise similarity scores (the higher, the more similar). Once this affinity matrix is obtained, it is post-processed and returned as a (weighted) adjacency matrix. If no parameters are specified when calling get_connectivity()
, basically no post-processing is done.
For AirQuality
, the get_similarity()
function returns the distance matrix after the Gaussian kernel: diagonal is all 1, and other values get lower with the distance of the two sensors. To get the same adjacent matrix of previous works you should:
threshold=0.1
include_self=False
Here is the demo:
from tsl.datasets import AirQuality
x = AirQuality()
index, weight = x.get_connectivity(include_self=False, threshold=0.1)
print(index.shape, weight.shape)
>>> (2, 5398) (5398,)
Which is exactly 2699 * 2 (the adjacency matrix is undirected).
Thank you very much for the answer! This clears up all my doubts
As far as I understand reading Table 5 of Appendix B from the original GRIN article, this dataset has the structure of an undirected graph with 2699 edges.
However, I have executed the following code in a Jupyter Notebook:
And I saw the this output:
(2, 66661) (66661,)
I don't understand well how this shape is possible in the edge index. First I thought that maybe you have already provided an implementation of this dataset as a directed graph, but shouldn't the shape be as much as
2*undirected_edges
?