Custom metrics handle document score ties differently from pytrec_eval

I ran an experiment and there were some inconsistencies in the metrics calculated. In the dataset used there exists only one relevant document per query. In this case, MAP is precisely the same as MRR. To my surprise, I noticed a difference between the MAP and the MRR. Digging into the results, I saw that the difference comes when there are some documents score ties. After looking into the code of pytrec_eval and reading The Impact of Score Ties on Repeatability in Document Ranking. I learned that pytrec_eval breaks down ties by sorting the docids in a descending manner. I have fixed the code for my experiment so the two metric calculations match.

I can create a pull request if needed.

Thanks,

Olivier

beir-cellar / beir

Custom metrics handle document score ties differently from pytrec_eval #105