Closed hapoyige closed 4 years ago
Hello, I have encountered the same problem as you. Also, in ./utils/layers.py, I didn't understand how the code calculates the correlation between f_1 and f_2. I have seen the code for the pytorch version, and I think the implementation of the two is different in this place. The pytorch version of the code is sensitive to the choice of random seeds, and switching to a different random seed can make the results very different.
Hello,
Thank you for your issue and interest in GAT!
The way in which attention heads are implemented here is exactly equivalent to the one in the paper, and it uses TensorFlow broadcasting semantics heavily.
For more details, see my response in this issue: https://github.com/PetarV-/GAT/issues/15
Thanks, Petar
Hi, Peter. I have read the issue. Thank you very much for your reply, he has helped me a great deal. Thanks, Ice
Thanks a lot, Petar, I understood!
I find in function attn_head() (in utils/layers.py) '''
simplest self-attention possible
f_1 = tf.layers.conv1d(seq_fts, 1, 1) f_2 = tf.layers.conv1d(seq_fts, 1, 1) logits = f_1 + tf.transpose(f_2, [0, 2, 1]) coefs = tf.nn.softmax(tf.nn.leaky_relu(logits) + bias_mat) ''' In my understanding,the codes equals to $$f_1 W_1 + f_2 W_2$$ but in the paper, the chose attention mechanism use concatenation, and $$W_1 = W_2 = W$$ Did I get something wrong?