DeepGraphLearning / KnowledgeGraphEmbedding

MIT License
1.24k stars 264 forks source link

Three questions #55

Open zulihit opened 2 years ago

zulihit commented 2 years ago

Thank you for your work and I have three questions:

  1. Why do you use this method to calculate the initialization range? I didn't see the relevant introduction in your paper. What's the purpose of this method?

self.embedding_range = nn.Parameter( torch.Tensor([(self.gamma.item() + self.epsilon) / hidden_dim]),
requires_grad=False )

self.entity_embedding = nn.Parameter(torch.zeros(nentity, self.entitydim)) nn.init.uniform( tensor=self.entity_embedding, a=-self.embedding_range.item(), b=self.embedding_range.item() )

  1. This range is also used when pluralizing relationships. Why can this be done?

phase_relation = relation/(self.embedding_range.item()/pi) re_relation = torch.cos(phase_relation) im_relation = torch.sin(phase_relation)

  1. In the rotate model, the calculations of head batch and tail batch are different in sign, but in the paper i can't find the head-batch part, i can't understand this part

if mode == 'head-batch': re_score = re_relation re_tail + im_relation im_tail im_score = re_relation im_tail - im_relation re_tail re_score = re_score - re_head im_score = im_score - im_head else: re_score = re_head re_relation - im_head im_relation im_score = re_head im_relation + im_head re_relation re_score = re_score - re_tail im_score = im_score - im_tail

albernar commented 7 months ago

I hope this can be of help for anybody who struggled as I did understanding point 2 (and, as a consequence, point 1, I guess): the reason why the values of the embeddings are projected in [-pi, pi] is that, if we initialize the weights in a uniform way as done with Xavier initialization, for example, the range of values assigned to the relation embeddings would be very close to zero. According to some experiments I ran, the model, in this case, tends to learn rotations with angles very close to zero, thus making triples like (head, relation, head) be extremely plausible: indeed, the rotation would be almost null, so that $$h \circ r \approx h$$. This would basically force the MRR and H@1 to collapse to zero, while leaving H@3, H@10 and MR good.

Instead, if we project the values of the relation embeddings in the range $[-\pi, \pi]$ (by using phase_relation = relation/(self.embedding_range.item()/pi)), the rotations would not all be almost null, but there would be more variability so that we could get better representations and hence better results.

In light of this, I believe the initialization of the relations as in point 1 of the above question is just a convenient way for having a uniform initialization (as for Xavier), but with more straight forward extremes.