acbull / pyHGT

Code for "Heterogeneous Graph Transformer" (WWW'20), which is based on pytorch_geometric
MIT License
775 stars 162 forks source link

Problems for attention scores #41

Open PegasusAM opened 2 years ago

PegasusAM commented 2 years ago

Hi there, I'm new to HGT and your work inspired me a lot! I just have three quick questions:

  1. considering n layers used, how could I obtain a unique attention score for a single source node to a target node. For now, I can only output the attention score per head in each layer.
  2. is there a way to determine the number of layers to be used, and should there be any tendency of the attention score of the same s-t node pair in the lth layer and (l+1)th layer? e.g., attention score tends to be smaller as the layer increases.
  3. how should I set the learning rate? will that affect the results a lot?

Thanks in advance!

acbull commented 2 years ago

Hi:

  1. For the first question, I think "q_mat k_mat self.relation_pri[relation_type] / self.sqrt_dk" is the one that calculates attention score per edge, and you could print it out instead of doing "sum" for visualization.
  2. This seems to be a hyper-parameter tuning problem, and I cannot give you the answer. It depends on different dataset & task.
  3. It's also a hyper-parameter tuning problem. Normally for adam, 1e-3 is a good default choice. Tuning it might influence the result a little bit.