AmenRa / ranx

⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
https://amenra.github.io/ranx
MIT License
427 stars 23 forks source link

[Question] How to compute precision for a retriever operating at passage-level #63

Closed ronanki closed 1 month ago

ronanki commented 5 months ago

If the retriever is operating at passage / chunk-level and hence the retrieved results can have duplicate IDs as shown below: Top-10 retrieval list for each of the queries:

q_1: ['d_1', 'd_1', 'd_1', 'd_1', 'd_1', 'd_1', 'd_5', 'd_5', 'd_5', 'd_5']
q_2: ['d_4', 'd_4', 'd_4', 'd_4', 'd_2', 'd_2', 'd_6', 'd_6', 'd_6', 'd_6']

Encoding them into a dictionary results in:

run_dict = { "q_1": { "d_1": 0.9, "d_5": 0.8 }, 
             "q_2": { "d_4": 0.9, "d_2": 0.8, "d_6": 0.7 } }

where qrels could be:

qrels_dict = { "q_1": { "d_1": 5, "d_5": 3 },
               "q_2": { "d_4": 6, "d_6": 1 } }

Considering above scenario, the precision@10 would yield very low score, even though it is not the case. How can we fix this issue?

AmenRa commented 1 month ago

Hi and sorry for the delay.

I suggest you use ids for passages OR perform a pooling operation on the scores for the passages for a given document, e.g. taking the maximum scores.

Hope it helps.