Issue with graph size change when applying to a new dataset for predictions // and other (simple) questions

raphaelmourad commented 3 years ago

Dear Daniele,

thank you first for your awesome package! I am starting now to use your package for a new research project combining CNN and GNN on the top. Since my aim is to use your package for making a new model for publication in a bioinformatic journal, I want to be sure I am not making any mistakes.

I have several questions/issues regarding node classification:

1) I noticed that if I train a model with a GCNConv layer with a graph A with size [100,100] for instance, and then if I want to predict for novel nodes from a new graph with for instance size [200,200], I got problem of shape, and the model can't be used. How can I fix that?

2) When using masks (in the examples given in spektral), you do: def mask_to_weights(mask): return mask.astype(np.float32) / np.count_nonzero(mask) Thus here the binary mask is divided by the number of ones. I guess that the aim is that sum of weights equals to one for the three masks : weights_tr, weights_va, weights_te. Why? If I don't do that, what is consequence? I guess this is done to compare the some loss between training set and validation set when both don't have the same size. But for accuracy (which I use for monitoring training) this should not impact since there are normalized measure between 0 and 1. I am using the code: model.compile(optimizer=RMSprop(lr=learning_rate), loss=binary_crossentropy, weighted_metrics=['acc','AUC']) where my masks are binary. But in your code I see: model.compile( optimizer=Adam(learning_rate), loss=CategoricalCrossentropy(reduction="sum"), weighted_metrics=["acc"], ) Should I used loss=CategoricalCrossentropy(reduction="sum")?

3) Another simple question I could not find response yet is: how is applied the mask? On which tensor? X (feature matrix), A (graph) or y (output)?

4) If I have a graph with nodes A, B, C and D, and if I have a table of columns (one column per feature and one line per node) and a binary vector output (one value per node). If I mask D during training, I know that the model will not try to use D for predicting D (but do it for A, B and C), but the model may try to use D for predicting A (for instance if A and D have a edge in the graph), but I don't want that of course! So my question is : when masking D, will the model try to use D for predicting A (for instance if A and D have a edge in the graph)? I hope this is not the case, since it will lead to data leaking during the training process.

Thanks! Raf

danielegrattarola commented 3 years ago

Hi,

This should not happen, can you report the full stack trace that you get when you have the error?
The reason for that conversion is that the original implementation of GCN by Kipf & Welling used the average loss instead of the sum (which would be the default in TF/Keras). If you remove that, you'll get slightly different results when trying to reproduce the GCN paper but it shouldn't substantially impact your model.
It is applied to the loss, so in that example it is used to mask unwanted nodes in training/validation/test. To answer your question, you see the effect of the mask on X, not A (since A does not get transformed by the GCN layer), although this is not a mask on X. It's better to interpret it as a mask of the target, if you prefer.
No, sample weights are not masks for the inputs. They are only used to mask the loss so that the backpropagation affects different nodes differently. If you don't want to use a node for the forward pass of the model, then the best way is to implement a mask (a real mask, not using sample weights) or to remove it altogether. However, what you described is known as transductive (or semi-supervised) prediction: the model has access to all features at all times, but you mask out some labels and use the remaining ones to train the model. If you're looking at the GCN paper as an example, this is what they do as well. You only have data leaking if you leak the labels, not the node features

Cheers

raphaelmourad commented 3 years ago

Hi Daniele,

ok thanks for the answers. I figured it out. Just to be sure. For my transductive prediction, I did the following to mask the 5th node (so that it's not used for training):

X_train[:,4,:]=mask_value # 1st dimension is batch, 2nd is node and 3rd is feature.

and then in the model when I used X as an input:

CNN=Masking(mask_value=mask_value)(X_in)

Does that sound correct? model.fit() works but I just want to be sure.

Then when I predict:

y_pred=model.predict([X_traint,A_train], batch_size =1) # here X_traint is like X_train without masking.
print(roc_auc_score(y_train[:,4,:],y_pred[:,4,:])) #  data used for testing

Thanks a lot! Raf

danielegrattarola / spektral

Issue with graph size change when applying to a new dataset for predictions // and other (simple) questions #228