PetarV- / GAT

Graph Attention Networks (https://arxiv.org/abs/1710.10903)
https://petar-v.com/GAT/
MIT License
3.15k stars 643 forks source link

Does the attention used in codes the same with the one in paper? #33

Closed hapoyige closed 4 years ago

hapoyige commented 4 years ago

I find in function attn_head() (in utils/layers.py) '''

simplest self-attention possible

f_1 = tf.layers.conv1d(seq_fts, 1, 1) f_2 = tf.layers.conv1d(seq_fts, 1, 1) logits = f_1 + tf.transpose(f_2, [0, 2, 1]) coefs = tf.nn.softmax(tf.nn.leaky_relu(logits) + bias_mat) ''' In my understanding,the codes equals to $$f_1 W_1 + f_2 W_2$$ but in the paper, the chose attention mechanism use concatenation, and $$W_1 = W_2 = W$$ Did I get something wrong?

KL-ice commented 4 years ago

Hello, I have encountered the same problem as you. Also, in ./utils/layers.py, I didn't understand how the code calculates the correlation between f_1 and f_2. I have seen the code for the pytorch version, and I think the implementation of the two is different in this place. The pytorch version of the code is sensitive to the choice of random seeds, and switching to a different random seed can make the results very different.

PetarV- commented 4 years ago

Hello,

Thank you for your issue and interest in GAT!

The way in which attention heads are implemented here is exactly equivalent to the one in the paper, and it uses TensorFlow broadcasting semantics heavily.

For more details, see my response in this issue: https://github.com/PetarV-/GAT/issues/15

Thanks, Petar

KL-ice commented 4 years ago

Hi, Peter. I have read the issue. Thank you very much for your reply, he has helped me a great deal. Thanks, Ice

hapoyige commented 4 years ago

Thanks a lot, Petar, I understood!