AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
427 stars 23 forks source link

[BUG] `MRR@1` is not equal `Recall@1` #30

Closed celsofranssa closed 1 year ago

celsofranssa commented 1 year ago

Describe the bug MRR@1 should be equal to Recall@1. However, these metrics diverge for the case below.

To Reproduce

%%capture
!pip install ranx

from ranx import Qrels, Run, evaluate
import pickle

# download files from https://drive.google.com/drive/folders/1ZLyB6mKKiQsypw36nhdZ4dGqmFw27K-3?usp=sharing
with open("qrels.pkl", "rb") as f:
    qrels = pickle.load(f)
with open("run.pkl", "rb") as f:
    run = pickle.load(f)

evaluate(
    Qrels(qrels),
    Run(run),
    ['mrr@1', 'mrr@5', 'mrr@10', 'recall@1', 'recall@5', 'recall@10'])

# {'mrr@1': 0.8133879123525163,
#  'mrr@5': 0.820242395055783,
#  'mrr@10': 0.8206332007078454,
#  'recall@1': 0.04814167526511499,
#  'recall@5': 0.05089848464127321,
#  'recall@10': 0.05171913427724859}

or use Google Colab.

Expected behavior mrr@1=recall@1

Am I missing something?

cadurosar commented 1 year ago

MRR@k is only equal to Recall@k if you have exactly one positive per query. Unfortunately, there are many meanings for Recall, but at least for trec eval (which ranx follows) it always uses positives_found/total_positives, even if k is smaller than the amount of positives you have.

For the other definitions, one I have seen called as R_cap (using positives_found/min(k,total_positives)) or Success ( the function "found any positives @ k").

AmenRa commented 1 year ago

Recall@1 is equal to MRR@1 only if you have exactly 1 relevant document per query, otherwise they are not as Recall takes into consideration the total number of relevant documents (retrieved + non-retrieved).

You can find the definition of both metrics in ranx's documentation (Recall - MRR) and Wikipedia (Recall - MRR).

celsofranssa commented 1 year ago

I see, Thank you for your answers.