[Question] How to compute precision for a retriever operating at passage-level

If the retriever is operating at passage / chunk-level and hence the retrieved results can have duplicate IDs as shown below: Top-10 retrieval list for each of the queries:

q_1: ['d_1', 'd_1', 'd_1', 'd_1', 'd_1', 'd_1', 'd_5', 'd_5', 'd_5', 'd_5']
q_2: ['d_4', 'd_4', 'd_4', 'd_4', 'd_2', 'd_2', 'd_6', 'd_6', 'd_6', 'd_6']

Encoding them into a dictionary results in:

run_dict = { "q_1": { "d_1": 0.9, "d_5": 0.8 }, 
             "q_2": { "d_4": 0.9, "d_2": 0.8, "d_6": 0.7 } }

where qrels could be:

qrels_dict = { "q_1": { "d_1": 5, "d_5": 3 },
               "q_2": { "d_4": 6, "d_6": 1 } }

Considering above scenario, the precision@10 would yield very low score, even though it is not the case. How can we fix this issue?

AmenRa / ranx

[Question] How to compute precision for a retriever operating at passage-level #63