Graph signal classification on MNIST (mixed mode)

FaranakAk commented 4 years ago

Hi! I've run your example on classification of MNIST data. Now I'm randomly changing the adjacency matrix by swapping a large number of random rows and columns and even replacing the matrix of adjacency with a zero matrix of the same size, but the accuracy of classification on test data is not affected. I've also tested this on my own dataset and the result is the same. I couldn't figure out what the problem is. Does it mean that the structure of the data graph has no effect on classification? So what is the point of using the adjacency matrix as an input? I'm not sure if I can ask you, but do you have any idea? Thanks.

FaranakAk commented 4 years ago

@danielegrattarola

cgarciae commented 4 years ago

Hey @FaranakAk ,

Do you have some minimal code we can look at?

danielegrattarola commented 4 years ago

Hi, sorry for the late reply but I've been traveling and I've only just gotten back to work. As @cgarciae said, it would be great to see some code.

I've tried running the following:

fltr = normalized_laplacian(adj)
fltr *= 0

and I get 10% accuracy as expected.

FaranakAk commented 4 years ago

I need to understand how defining the structure of the data as a graph affects the results. So, I'm changing "adj" itself, before doing normalized laplacian.

adj = adj.toarray() adj_copy = adj.copy() for ii in range(0,784): inds = np.random.choice(784, 2) adj[[inds[0], inds[1]], :] = adj[[inds[1], inds[0]], :]
adj[:, [inds[0], inds[1]]] = adj[:, [inds[1], inds[0]]]

adj = sparse.csr_matrix(adj) fltr = normalized_laplacian(adj)

GraphClass.txt GraphClass_Results.docx

danielegrattarola commented 4 years ago

So, if I'm reading this correctly, you're swapping out 784 edges out of a possible 784*748 = 614656, which is equivalent to 0.12% of edges. Additionally, you're swapping edges randomly, but in A there are only 6396 nonzero entries (see A.nnz) for a total density of about 1%. The probability that you're actually swapping out edges with your code is basically zero. You can try increasing the range in your for loop, or re-writing the logic to only swap ones with zeros.

I hope this makes sense. Cheers, Daniele

FaranakAk commented 4 years ago

Hi! Thanks a lot for your response. Swapping each two rows (and their associated columns to preserve the symmetry of the adjacency matrix) would be equivalent to randomly swapping two pixel values in the image. When we do this randomly or 784 times (equal to number of pixels) it is highly probable that the whole image gets deformed. So it will not only affect the 0.12% of the pixels, but most likely over 90% of pixels, which will convert an "8" to a random 28x28 image. I have attached four images for better visualization: im1: image of an "8" adj1: adjacency matrix of that "8" image adj2: adjacency matrix that is created by randomly swapping rows and columns of adj1 im2: corresponding image of adj2 im1 adj1 adj2 im2

@danielegrattarola

danielegrattarola commented 4 years ago

Sorry, I forgot about this issue. Did you ever understand what was the problem?

FaranakAk commented 4 years ago

No, not yet.

danielegrattarola commented 4 years ago

So I've been doing some experiments to see whether this is a bug or something else. As far as I can tell, it is not a bug in the code. I think that the main issue here is that the network is able to more or less extract all the necessary information for classification even if the graph is random.

At first I thought that the issue was that you were using normalized_laplacian() instead of localpooling_filter(), which is the correct pre-processing that you should do for GCN. That turned out to have some effect, but it didn't solve the main problem. Here are the scores that I got after one epoch:

	`localpooling_filter()`	`normalized_laplacian()`
Default	91.1	86.3
Randomized	89.3	91.2

So from what I can tell:

When the graph is not mixed up, using the proper normalization is better than using the Laplacian:
When the graph is mixed up, it is not that bad as you'd expect and the network is still able to learn;
It is actually better to shuffle the graph than to use the wrong normalization (big WTF moment);
If the correct normalization is used, then shuffling the graph is worse (as expected) but not by that much.

Anyway, none of these seems to have any significant impact on the performance. I also tried to have an adjacency matrix of all zeros, all ones, and completely random (np.random.uniform(0, 1, (N, N))), with the same results. The only way to have a truly random performance is to set fltr to np.zeros, but that's trivial because the output of each layer will be exactly the bias vector. I think the main point here is that whatever you have on the adjacency matrix, the network is still able to use each node's information and therefore it can learn.

The main takeaway is that MNIST is not that good a benchmark, but that is a known fact I guess. What's weird is that you say that the same thing happens with a different dataset. It could be that graph signal classification is not a hard task at all, I'm not sure.

Sorry that I couldn't be of more help, this is kinda weird and unexpected for me as well.

FaranakAk commented 4 years ago

Thanks for your detailed response. Actually, I'd done those experiments before and I was wondering what was the point of representing an image as a graph, when the structure of the graph has no effect on the performance. Anyway, thanks again for your time.

danielegrattarola commented 4 years ago

The idea of using MNIST + a KNN graph is taken originally from the paper Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering, where they wanted to show that their GNN was able to generalize traditional CNNs for image classification. I guess MNIST is not a good dataset to show this.

StefanBloemheuvel commented 4 years ago

I have one question considering this example. Is the graph the connections between the images of digits? And to classify, the algorithm takes into account the time series of the image and the connections in the graph between all the known image-connections?

danielegrattarola commented 4 years ago

Hi,

no, each graph represents one digit. Nodes are pixels and connections are given by a KNN graph. You can have a look at the above-mentioned paper for more details.

Cheers

danielegrattarola / spektral

Graph signal classification on MNIST (mixed mode) #42