Scaling for large sparse graphs?

danielegrattarola / spektral

Graph Neural Networks with Keras and Tensorflow 2.

https://graphneural.network

MIT License

2.37k stars 334 forks source link

Scaling for large sparse graphs? #73

Closed vmullig closed 4 years ago

vmullig commented 4 years ago

I'm just starting to get into Spektral (which looks like a great package by the way!), and although I think it will be excellent for the project that I immediately want to apply it to, I have some concerns about its applicability to other projects down the road involving very large graphs with sparse connectivity (say, on the order of a million nodes, each connected to four or five other nodes). Given 10^6 nodes, you would need an adjacency matrix with 10^12 entries -- a terabit of memory to represent a few million edges! Are there any plans to also allow a list representation of sparse adjacency matrices (say, a num_edges-by-2 matrix listing node indices, in this example costing a few megabytes of memory)?

danielegrattarola commented 4 years ago

Hi,

Most layers in Spektral support TensorFlow's SparseTensor to represent the adjacency matrix. Internally, these are exactly what you need, basically a list of edges (see tf.SparseTensor.indices).

If you start with an edge list edges, you can create a SparseTensor as:

# Create a 3 x 3 adjacency matrix
edges = [(0, 1), (1, 0), (1, 2), (2, 1)]
tf.SparseTensor(indices=edges, values=np.ones(len(edges)), dense_shape=(3, 3))

Could this work for you? Cheers

vmullig commented 4 years ago

I think so, yes. Thanks!

gastoneb commented 3 years ago

Hi @danielegrattarola , I am new to Spektral as well and I agree that it looks like a great package! Anyway, I immediately ran into the same issue regarding the size of the adjacency matrix. In my case though, I was using GraphConv, and unfortunately GraphConv.preprocess(A) did not accept a SparseTensor. I'm guessing more of the spektral.utils.convolution functions are going to fail here too. Following your example, I converted A to a Scipy COO matrix instead and it seems happy.

import numpy as np
from scipy.sparse import coo_matrix

data = np.ones(len(edges), dtype="int")
row = np.array([edge[0] for edge in edges])
col = np.array([edge[1] for edge in edges])
A = coo_matrix((data, (row, col)), shape=(len(edges), len(edges)) )

danielegrattarola commented 3 years ago

Hi,

yes, that seems correct. The utils module is designed for pre-processing, i.e., outside of Tensorflow. SparseTensors tend to be obnoxious when doing even the most basic operations, so it's better to work in Scipy for as long as possible and only use TF for the actual neural network.

Cheers