Sujit-O / pykg2vec

Python library for knowledge graph embedding and representation learning.
MIT License
602 stars 109 forks source link

Probable bug on the infer_tails() and infer_heads() methods #196

Open Rodrigo-A-Pereira opened 3 years ago

Rodrigo-A-Pereira commented 3 years ago

Hi, when using the inference capabilities of the library, i came a cross some weird behaviour, more concreatly i was training the UMLS dataset using the TransE model, I obtained very high results for the hits@10 (0.5651) and filtered hits@10 (0.9713). However when using the infer_tails() method to infer some triples taken from the test set, i noticed that the correct triples were nowhere near the top 10, on the contrary, i noticed they where always near the bottom 10 vaues.

As such i decided to look a bit more into it. It was when i looked at the metric calculator, more specifically the get_tail_rank() and get_head_rank() methods, that i noticed that there the list of tail and head candidates specifically, were being transversed from last to first:

trank = 0
ftrank = 0
for j in range(len(tail_candidate)):
   val = tail_candidate[-j - 1]
   if val != t:
         trank += 1
         ftrank += 1
         if val in self.hr_t[(h, r)]:
             ftrank -= 1
    else:
         break

   return trank, ftrank

This made sense since the tail_candidates is obtained by calling the test_tail_rank() with topk=total_entities: self.test_tail_rank(h_tensor, r_tensor, self.config.tot_entity)

function that returns: _, rank = torch.topk(preds, k=topk)

The rank is a list of indexes of the entities ordered from the highest "pred" to lowest, and since this "pred" value is the value of the scoring function (h +r - t, in the case of TransE) the lower values are the ones more likely to be the correct link. Hence i understood why the list was being transeversed from last to first.

However when it comes to the infer_tails() and infer_heads() methods, they call the test_tail_rank() and test_head_rank() but do not inverse the list, which is returning the user the top X less likelly predicted tails/heads, instead of the top X most likely predictions.

This leads me to think that this is a bug, or alternatevelly I am missing some factor in terms of using this inference capability.

Sorry for the long post,

Best regards,

Rodrigo Pereira

mscsedu commented 3 years ago

@baxtree @louisccc could anyone of you validate this?

baxtree commented 3 years ago

Is there any further evidence which can be shared here? Such as top X predicted tails/heads on UMLS as well as the true X most likely tails/heads but treated as least likely.