Closed monk1337 closed 4 years ago
Looking at your GCN, it looks like the first layer has a hidden size of 1024 units, whereas for GAT you have set hid_units
to [8]
. This means that the first layer has 8 heads with a hidden size of size 8 each, which is much smaller than the hidden size used in the GCN.
You can try changing hid_units
to [128]
or [256]
, which will increase the hidden size of each head in the first layer, increasing the capacity of the model.
@gcucurull And the code which I am using is correct? I had doubt that maybe I am not using the network properly.
@monk1337 Yes, it looks good.
@gcucurull Thanks for the quick response. I have one more doubt, So in attention heads, we are passing a list [8, 1]
I went through the code and the got the idea that it's for the output layer.
for i in range(n_heads[-1]):
out.append(layers.attn_head(h_1, bias_mat=bias_mat,
out_sz=nb_classes, activation=lambda x: x,
in_drop=ffd_drop, coef_drop=attn_drop, residual=False))
logits = tf.add_n(out) / n_heads[-1]
But what should be the ratio between input heads and output heads? How it is affecting output?
The output layer is the one computing the logits, if you use multiple heads, the final logits will be the average over the logits produced by each output head.
However, in our experiments we always used only 1 output head, that's why it is set to [8,1].
There isn't really a ratio between input heads and output heads since the number of output heads should be 1. The number of input heads basically controls the number of parameters of the model and its expressive power, so you might want to increase or decrease depending on your task.
@gcucurull Thanks for the response. Is it input heads are same as no_of_classes? Or what is the default head size should I use if I have big graph?
The number of input heads and number of classes is not related.
8 input heads worked well for our case so I suggest starting with that value and tweaking it empirically. Increasing it will increase the capacity of the model, decreasing it will reduce it but also speeds things up and lower the memory consumption.
@gcucurull I tried to experiment with no of heads and hidden units from range 2 to 1024 but couldn't get accuracy nearby GCN layer which I showed above. GCN is producing 90% accuracy and Gat is not crossing more than 85% after many combinations of hidden units and no of heads. I also tried to add two layers of GAT, let me know if it is correct :
logits_graph = GAT.inference( inputs = realtion_batch,
nb_classes = 800,
nb_nodes = 22,
training = True,
attn_drop = 0.0,
ffd_drop = 0.0,
bias_mat = adj_batch,
hid_units = [8],
n_heads = [8,1],
residual = False,
activation = tf.nn.elu)
logits_graph_s = GAT.inference( inputs = logits_graph,
nb_classes = 256,
nb_nodes = 22,
training = True,
attn_drop = 0.0,
ffd_drop = 0.0,
bias_mat = adj_batch,
hid_units = [8],
n_heads = [8,1],
residual = False,
activation = tf.nn.elu)
But when I tried these two layers, accuracy is 0.0 for 100 epochs.
Why Gat is not performing better than GCN?
The code is not quite ok.
First of all, if you want to have multiple GAT layers, you don't have to call GAT.inference
twice, you have to increase the number of elements in the hid_units
list. Also, why do you set nb_classes
to 800? Do you really have 800 classes? You also seem to be working with very small graphs, with nb_nodes
set to 22.
The correct way to have a GAT model with 2 layers, with 8 heads per layer and 128 units per head is the following:
logits_graph = GAT.inference( inputs = realtion_batch,
nb_classes = NUMBER_OF_OUTPUT_CLASSES,
nb_nodes = NUMBER_OF_NODES,
training = True,
attn_drop = 0.0,
ffd_drop = 0.0,
bias_mat = adj_batch,
hid_units = [128, 128],
n_heads = [8, 8, 1],
residual = False,
activation = tf.nn.elu)
@gcucurull n_heads are [8,8,1] or [128,128,1] ?
Sorry, you are right, it is [8, 8, 1], I edited the message to correct it.
Did this work?
Yup
I am trying to extract only features from graph attention network, I was using Gcn as feature extractor and I want to replace it with GAT
Where GraphConvolution layer is defined as :
Now to replace gcn layer with GAT, I tried this :
Now I want to get just the logits from GAT as features and it should learn the features too, so I set training = True
But the accuracy from GCN features I was getting around 90% but in GAT features I am not able to get accuracy more than 80 %, instead, it should increase the accuracy compared to GCN.
Is there anything I am missing in the network or my hyperparameters are not correct to compare to the hyperparameters i was using in GCN.
@PetarV- @gcucurull Can you suggest me how I can extract feature from GAT and if I am doing correct way then why I am not getting good accuracy.
Thank you