Input with batch dimension for GCN

amjass12 commented 3 years ago

Hi Daniele, thank you for this really useful package!

I have a question about the input to my GCN. I am attempting to merge a GCN with a CNN. Construction of the model works fine, however, when specifying a batch dimension (because the merged model requires this) I am confused about the Input layer for the GCN as it is throwing an error at the concatenation layer. Models is as follows:

def graph_cnn(state_adjacency,cnn_input_shape):
    '''create merged NN with GCN representing environment state
        and CNN representing agwnt position'''

    #CNN branch
    cnn_branch_input = tf.keras.layers.Input(shape=(4,4,1))
    cnn_branch_two = tf.keras.layers.Conv2D(32, (2, 2), activation='relu', padding='same')(cnn_branch_input)
    cnn_branch_three = tf.keras.layers.MaxPooling2D(1, 1)(cnn_branch_two)
    cnn_branch_four = tf.keras.layers.Conv2D(32, (2, 2), activation='relu', padding='same')(cnn_branch_three)
    cnn_branch_five = tf.keras.layers.Flatten()(cnn_branch_four)
    cnn_branch_six = tf.keras.layers.Dense(32, activation='relu')(cnn_branch_five)

    #GCN branch: Spektral library
    #node_features = 
    #preprocess adjacency matrix -- self loops

    node_feat_input = tf.keras.layers.Input(shape=(4,), name='node_feature_inp_layer')
    graph_input_adj = tf.keras.layers.Input(len(adjacency), sparse=True, name='graph_adj_layer')
    gnn_branch = GraphConv(16, 'relu')([node_feat_input, graph_input_adj])
    gnn_branch = tf.keras.layers.Dropout(0.5)(gnn_branch)
    gnn_branch_two = GraphConv(1, 'linear')([gnn_branch, graph_input_adj])
    gnn_branch_two = tf.keras.layers.Dense(32, activation='relu')(gnn_branch_two)

    #merged layer
    merged = tf.keras.layers.Concatenate(axis=1)([gnn_branch_two, cnn_branch_six ])

    #output layer: action prediciton
    output_layer = tf.keras.layers.Dense(7, activation = 'linear')(merged)
    #put model together
    merged_model = tf.keras.models.Model(inputs=[cnn_branch_input, node_feat_input, graph_input_adj],
                                        outputs=[output_layer])
    #compile mode
    merged_model.compile(optimizer='adam', 
                         weighted_metrics=['acc'],
                         loss='mse')

    return merged_model

The inputs (made smaller just to establish pipeline) are: input = np.array((adj)) -- adjacency matrix (4,4) node_features = input (just made input for the purpose of running pipeline)

and now reshaping (and where the error occurs). For CNN, the input shape will be the same size as the adjacency matrix but will be an array with different 0's and 1's to the graph (again for the purpose of establishing this i have just made it the adjacency matrix).

cnn_input = np.expand_dims(input, 2)
cnn_input = np.expand_dims(input, axis=0)
shape = (1,4,4,1)

gcn_input = input (adjacency matrix).
gcn_input.shape = (1,4,4)
node_feature.shape = (1,4,4)
y.shape = (1,4,7)

My confusion is the Input layer for the graph network. the current shape is shape(4,) and shape(4) (len(adjacency), as you can see. When i run this model just to see if i can get the model to start training, I receive the following error.

model.fit([x, gcn_input, node_feature], y,
    #batch_size=4,
    shuffle=False)

ValueError: A Concatenate layer requires inputs with matching shapes except for the concat axis. Got inputs shapes: [(None, 4, 32), (None, 32)]

the shapes of my inputs are as follows (they need to have 1 as the initial batch dimension but in real training they will be in batches of 16 or 32.

I'm not sure how to fix the None,4,32 dimension to be the required None,32 for the concatenation layer! any help is much aprpeciated!

thanks, and sorry for the long post, i hope the code is informative in diagnosing the problem

danielegrattarola commented 3 years ago

Hi,

if I understood this correctly, the problem is with your input layers. The node features have shape [1, 4, 4], but you're creating an input layer with implicit shape [None, 4]. This is a valid input for the GCN, so you don't see errors, but it creates a problem with the concatenation. Same thing for the adjacency matrix: it is of shape [1, 4, 4], but the Input layer has an implicit shape of [None, 4].

Try to adjust your inputs so that they match the actual data that you pass to the network, it should work then.

Cheers

amjass12 commented 3 years ago

Hi @danielegrattarola

Yes, indeed this is the problem! ok, so i changed the input to the following (<- for input and <<-- for new Flatten layer before Dense layer):

node_feat_input = tf.keras.layers.Input(shape=(4,4), name='node_feature_inp_layer') <-
graph_input_adj = tf.keras.layers.Input(shape=(4,4), name='graph_adj_layer') <-
gnn_branch = GraphConv(16, 'relu')([node_feat_input, graph_input_adj])
gnn_branch = tf.keras.layers.Dropout(0.5)(gnn_branch)
gnn_branch_two = GraphConv(1, 'linear')([gnn_branch, graph_input_adj])
gnn_branch_three = tf.keras.layers.Flatten()(gnn_branch_two)<<--
gnn_branch_four = tf.keras.layers.Dense(32, activation='relu')(gnn_branch_three)

so I have changed the input shape to both shape=(4,4) and I also added a Flatten layer after the second GraphConv layer as I was getting an incompatible concatenation shape without it and this then worked.

I assume the addition of the Flatten layer is fine and doesn't interfere with the patterns in the GraphConv layer?

My last follow up questions are:

can i omit the batch argument in model.fit?

model.fit([x, gcn_input, node_feature], y,
    #batch_size=4, can be ignored?
    shuffle=False)

because this GCN model is concatenated to a CNN, can weighted metrics be replaced with the normal metrics argument? or this wont make a difference?

merged_model.compile(optimizer='adam', 
                         weighted_metrics=['acc'], #can use 'metrics' instead?
                         loss='mse')

and finally, is there a standard way to prepare or generate node features?

thanks so much!

danielegrattarola commented 3 years ago

Huh! I had not noticed that you were flattening the CNN output. Luckily, my answer above is still valid :D

Yes, the Flatten layer is necessary to obtain a vector representing your graph. However, it is more customary in GNNs to use a different approach other than flattening, which is usually called "global pooling". This consists of either summing, or multiplying, or averaging the nodes together. So if a Flatten takes a matrix of shape [n_nodes, n_features] and returns a vector of shape [n_nodes * n_features, ], a global pooling layer would give you an output of simply [n_features, ]. The reason for this is that Flatten implicitly defines an ordering of your nodes, but the whole point of GNNs is that you usually don't want to define such ordering.

To make a long story short: check out this part of the docs.

As for your other questions:

can i omit the batch argument in model.fit?

If you don't specify it, it is by default set to 32 by Keras.

because this GCN model is concatenated to a CNN, can weighted metrics be replaced with the normal metrics argument? or this wont make a difference?

Since you're not using sample weights, this won't make a difference. From your questions, I gather that you are mixing up the example for node-level learning and the ones for graph-level learning. You're interested in the graph-level one, here.

is there a standard way to prepare or generate node features?

That depends on your data. Node features should be given, but if they aren't you can also substitute them for dummy features. You can, for instance, set them to 1 for all nodes, or you can compute the node degrees and use those as features.

Cheers

amjass12 commented 3 years ago

thank you so much for your detailed reply @danielegrattarola :), all of your answers make sense!

so just to clarify for the GCN branch, replacing the flatten layer with a GlobalPooling(<---) before the Dense layer like so should be an acceptable replacement for the Flatten layer?:

node_feat_input = tf.keras.layers.Input(shape=(4,4), name='node_feature_inp_layer')
graph_input_adj = tf.keras.layers.Input(shape=(4,4), name='graph_adj_layer')
gnn_branch = GraphConv(16, 'relu')([node_feat_input, graph_input_adj])
gnn_branch = tf.keras.layers.Dropout(0.5)(gnn_branch)
gnn_branch_two = GraphConv(1, 'linear')([gnn_branch, graph_input_adj])
gnn_branch_three = spektral.layers.GlobalAvgPool()(gnn_branch_two)<---
gnn_branch_four = tf.keras.layers.Dense(32, activation='relu')(gnn_branch_three)

concatenate with CNN... etc

and then merging with the Dense layer from the CNN will represent the graph in a more faithful way without implicit of ordering of nodes (which I agree would not be useful). Fyi, when replacing the GlobalPooling with the Flatten layer, the model does still run which is good!

thanks!

danielegrattarola commented 3 years ago

replacing the flatten layer with a GlobalPooling(<---) before the Dense layer like so should be an acceptable replacement for the Flatten layer?

Yep, definitely acceptable.

Glad that it works! Good luck with your project.

amjass12 commented 3 years ago

thank you for confirming and thank you so much for taking the time to answer all of my questions in the last few days!

closing 👍

StefanBloemheuvel commented 3 years ago

Hi @amjass12, is there a reason for using a 'linear' activation function instead of a 'relu' in the last graphConv layer? Does it have an advantage of using only 'relu' in all layers?

amjass12 commented 3 years ago

Hi @StefanBloemheuvel ,

I believe this is as a result of node classification and maybe @danielegrattarola can provide an explanation ... The feature representation upstream of this linear classification trickles down to the linear layer to then assign each node a class hence the linear activation function - it would not make sense in my mind to use a non-linear activation.. again, maybe Daniele can provide a more detailed explanation!

I have found this paper to be helpful: https://arxiv.org/pdf/1902.07153.pdf

danielegrattarola commented 3 years ago

Hi,

I did not write the code above, so I don't know the constraints/requirements under which it was developed.

A general guideline for activations is to always use the same activation, except when you have a specific requirement for the activation's image.

So for instance, if you're doing classification you'll want the output layer to have an activation that constraints the output to [0, 1] (sigmoid or softmax). If you're doing regression you usually want to have a linear output so that your network can compute all possible values, etc...

My general rule of thumb is to use ReLU everywhere except in the output layer.

Cheers

danielegrattarola / spektral

Input with batch dimension for GCN #119