PetarV- / GAT

Graph Attention Networks (https://arxiv.org/abs/1710.10903)
https://petar-v.com/GAT/
MIT License
3.18k stars 642 forks source link

Question about pubmed dataset results #12

Closed hotchilipowder closed 5 years ago

hotchilipowder commented 5 years ago

Hello, I am working on GAT. It's an excellent idea to introduce attention to graph.

But I have some questions about the results on Pubmed dataset which you said "79.0 +- 0.3%".

In the paper, it said :

we found that Pubmed’s training set size (60 examples) required slight changes to the GAT architecture: we have applied K = 8 output attention heads (instead of one), and strengthened the L2 regularization to λ = 0.001. Otherwise, the architecture matches the one used for Cora and Citeseer.
...

Both models are initialized using Glorot initialization (Glorot & Bengio, 2010) and trained to minimize cross-entropy on the training nodes using the Adam SGD optimizer (Kingma & Ba, 2014) with an initial learning rate of 0.01 for Pubmed, and 0.005 for all other datasets.

Since you said "we use 500 additional nodes for validation purposes (the same ones as used by Kipf & Welling (2017))", I copy the data from gcn.

And I changed the hyperparameter to:

dataset = 'pubmed'
lr = 0.01  # learning rate
l2_coef = 0.001  # weight decay
hid_units = [8] # numbers of hidden units per each attention head in each layer
n_heads = [8, 8] # additional entry for the output layer

and run 100 times, here is the code difference, I get the results " 0.777 +- 0.8%" , which is not able to pass t-test.

(79.0-77.7)/(0.8/\sqrt(100)) = 16.25 > 1.984 (sig 0.05)

Is there anything wrong about the experiments parameters ? Can you help me to reproduce the pubmed dataset results?

Thank you

PetarV- commented 5 years ago

Hello,

Thanks for your issue and the kind words about our work!

Regarding the PubMed setup, we have found that slight changes to the early stopping strategy were necessary. Namely, we early stop based on the loss only in this case.

Here is a relevant code segment:

if val_acc_avg/vl_step >= vacc_mx or val_loss_avg/vl_step <= vlss_mn:
    if val_loss_avg/vl_step <= vlss_mn:
        vacc_early_model = val_acc_avg/vl_step
        vlss_early_model = val_loss_avg/vl_step
        saver.save(sess, checkpt_file)
    vacc_mx = np.max((val_acc_avg/vl_step, vacc_mx))
    vlss_mn = np.min((val_loss_avg/vl_step, vlss_mn))
    curr_step = 0

Hope that helps!

Thanks, Petar

PetarV- commented 5 years ago

It should also be noted that, for PubMed specifically, a different choice of attention mechanism may yield even better results. Consider the AGNN paper, reporting 79.9% with a simpler attentional construction:

https://arxiv.org/abs/1803.03735

hotchilipowder commented 5 years ago

Thank you for your reply. After using your early stopping strategy, I get "0.788 +-0.3%".

Although there is still an error with the results of the paper, it has improved a lot than the previous results("0.777 +- 0.8%").