Softmax eliminates node i's effects in attention machanism

PetarV- / GAT

Graph Attention Networks (https://arxiv.org/abs/1710.10903)

https://petar-v.com/GAT/

MIT License

3.18k stars 642 forks source link

Softmax eliminates node i's effects in attention machanism #4

Closed FeiGSSS closed 6 years ago

FeiGSSS commented 6 years ago

Hi, I read your paper and surprised at the amazing results of attention mechanism. And I wonder why it's so useful, so I did some math by myself and found something confusing: If we write vector a as (a_1 , a_2), a_1 and a_2 have the same dimension. Then e_ij will be a_1*W*hi +a_2*W*hj, if we apply this to the next step Softmax function, then we can easily derive that Alpha_ij = exp(a_2*W*hj) / sum_k(a_2 * W *hk), which means that the attention has nothing to do with node i.

PetarV- commented 6 years ago

Hello,

Thank you for your interest. I'm afraid that you probably read an outdated version of the paper (please check the most recent one on arXiv or OpenReview). We apply a LeakyReLU nonlinearity to a_1*W*h_i + a_2*W*h_j before applying the softmax, which fixes the problem you mentioned.

Thanks, Petar

FeiGSSS commented 6 years ago

Thanks for your relay and yeah I actually read the outdated version:( , Thanks very much