Open zulihit opened 2 years ago
I hope this can be of help for anybody who struggled as I did understanding point 2 (and, as a consequence, point 1, I guess): the reason why the values of the embeddings are projected in [-pi, pi] is that, if we initialize the weights in a uniform way as done with Xavier initialization, for example, the range of values assigned to the relation embeddings would be very close to zero. According to some experiments I ran, the model, in this case, tends to learn rotations with angles very close to zero, thus making triples like (head, relation, head) be extremely plausible: indeed, the rotation would be almost null, so that $$h \circ r \approx h$$. This would basically force the MRR and H@1 to collapse to zero, while leaving H@3, H@10 and MR good.
Instead, if we project the values of the relation embeddings in the range $[-\pi, \pi]$ (by using phase_relation = relation/(self.embedding_range.item()/pi)
), the rotations would not all be almost null, but there would be more variability so that we could get better representations and hence better results.
In light of this, I believe the initialization of the relations as in point 1 of the above question is just a convenient way for having a uniform initialization (as for Xavier), but with more straight forward extremes.
Thank you for your work and I have three questions:
self.embedding_range = nn.Parameter( torch.Tensor([(self.gamma.item() + self.epsilon) / hidden_dim]),
requires_grad=False )
self.entity_embedding = nn.Parameter(torch.zeros(nentity, self.entitydim)) nn.init.uniform( tensor=self.entity_embedding, a=-self.embedding_range.item(), b=self.embedding_range.item() )
phase_relation = relation/(self.embedding_range.item()/pi) re_relation = torch.cos(phase_relation) im_relation = torch.sin(phase_relation)
if mode == 'head-batch': re_score = re_relation re_tail + im_relation im_tail im_score = re_relation im_tail - im_relation re_tail re_score = re_score - re_head im_score = im_score - im_head else: re_score = re_head re_relation - im_head im_relation im_score = re_head im_relation + im_head re_relation re_score = re_score - re_tail im_score = im_score - im_tail