Open Niuyuhang03 opened 5 years ago
Actually, if you read the Experimental setup (Transductive learning) part in the GAT paper, the second layer (output layer) is used for classification directly with out_features = number of classes.
The second layer is used for classification: a single attention head that computes C features (where C is the number of classes), followed by a softmax activation.
Hi, Diego! Thank you for sharing the code.
I have read and run the code in branch similar_impl_tensorflow. There are some questions I confuse about.
The paper of GAT tells that the author replaces concatenation with averaging of heads in the second (predict) layer in multi-head. But I only find
x = self.out_att(x, adj) return F.log_softmax(x, dim=1)
in models.py. It happens in not only similar_impl_tensorflow branch but also master branch. Could you tell me if pyGAT contains this part? Thanks.Besides, since the paper's output is h', which represents new features of entities. But the output of pyGAT's second layer is classification result. I believe both outputs of the first layer and the second layer are new features h', and the biggest difference between them is that the dimension of the latter one is equal to the dimension of nClass, am I right?
Thanks again,
Jason