Problems for attention scores

acbull / pyHGT

Code for "Heterogeneous Graph Transformer" (WWW'20), which is based on pytorch_geometric

MIT License

775 stars 162 forks source link

Hi there, I'm new to HGT and your work inspired me a lot! I just have three quick questions:

considering n layers used, how could I obtain a unique attention score for a single source node to a target node. For now, I can only output the attention score per head in each layer.
is there a way to determine the number of layers to be used, and should there be any tendency of the attention score of the same s-t node pair in the lth layer and (l+1)th layer? e.g., attention score tends to be smaller as the layer increases.
how should I set the learning rate? will that affect the results a lot?

Thanks in advance!

acbull / pyHGT