"Min" Policy for ties in scoring

AndRossi commented 5 years ago

Hi, thank you for your work on this model. I really appreciate it. I am writing because, studying the code in the tester.py module, I found the get_rank:

    def get_rank(self, sim_scores):#assuming the test fact is the first one
        return (sim_scores > sim_scores[0]).sum() + 1.0

In this method, you count the rank of the target entity as the number of entities for which score strictly higher than the one of the target entity itself.

This means that if separate entities yield exactly the same ranking as the target entity, you will not count them in the ranking. In other words, in case of ties, you always return the minimum rank. This is a "min" policy; is this the expected behavior?

I am asking this because I believe that the "min" policy is not best one for link prediction models. In theory, a model using "min" policy could give the same rank to all entities in all predictions, and it would score MRR = 1.0.

Of course its effects depend on how much your model is prone to give the same, identical score to multiple answers: if there are not ties at all, the policy will never be applied. In your experience, does SimplE generate ties?

joyfenix commented 4 years ago

if you change the ranking function to def get_rank(self, sim_scores):#assuming the test fact is the first one return (sim_scores >= sim_scores[0]).sum()

You will find the difference.

baharefatemi commented 4 years ago

I'm sorry to get back to you so late. What @joyfenix mentioned is completely correct. I'll update the code to reflect that.

baharefatemi / SimplE

"Min" Policy for ties in scoring #4