Closed wjj5881005 closed 3 years ago
We follow the MTAD-GAT paper, in which they linearly transform the node features (multiplying with W) when calculating the attention weights, but when using attention weights to get new node representations (weighted sum), they use the original node features.
However, I see your point, and one could try to adapt the code to use Wx instead of x in the weighted sum to get new node representations.
h = self.sigmoid(torch.matmul(attention, x))
I found the attention alpha are multiplicated with original x instead of W * x, which is not same to the graph convolution network. Could you show me the reason? Thanks very much!