beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

Custom metrics handle document score ties differently from pytrec_eval #105

Open obelhumeur opened 2 years ago

obelhumeur commented 2 years ago

I ran an experiment and there were some inconsistencies in the metrics calculated. In the dataset used there exists only one relevant document per query. In this case, MAP is precisely the same as MRR. To my surprise, I noticed a difference between the MAP and the MRR. Digging into the results, I saw that the difference comes when there are some documents score ties. After looking into the code of pytrec_eval and reading The Impact of Score Ties on Repeatability in Document Ranking. I learned that pytrec_eval breaks down ties by sorting the docids in a descending manner. I have fixed the code for my experiment so the two metric calculations match.

I can create a pull request if needed.

Thanks,

Olivier