Closed luomancs closed 2 years ago
Hi @luomancs, (1) Yes, you should retrieve the document from the entire corpus. (2) When you get predictions from the model, a higher score denotes a higher rank. So, my advice would be to use the BM25 scores as it is.
Hope it helps!
Kind Regards, Nandan Thakur
Hi Nandan,
Thanks for the response, as you suggested, I retrieve relevant document from the entire corpus and assign the bm25 score to each retrieved document. And I used the pytrec_eval to evaluate bm25 results, however, I got much lower ndcg@10 score as you had in the paper. Specifically, my script looks like follows, I wonder should I do any processing of the "qrel" which is the test.tsv that is given.
import pytrec_eval import json
% from qrels/test.tsv, where each q_i has more than 10 documents from test.tsv qrel = { 'q1': { 'd1': 0, 'd2': 1, 'd3': 2, 'd4':0 }, 'q2': { 'd2': 1, 'd3': 1, }, }
% prediction from bm25, where each q_i has 10 relevant documents and the score is assigned by bm25.
run = { 'q1': { 'd4': 1.0, 'd10': 0.0, 'd9': 1.5, }, 'q2': { 'd1': 1.5, 'd6': 0.2, } }
evaluator = pytrec_eval.RelevanceEvaluator( qrel, {'ndcg'})
ndcg = 0 for i, item in evaluator.evaluate(run).items(): ndcg += item['ndcg'] print(round(ndcg/len(qrel),3))
Hi @luomancs,
Use this easy example for BM25 scores, it should work automatically for you! https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_bm25.py
Only replace L33 in the example as dataset = 'trec-covid'
Hi Nandan, got it! Thank you !
Hi, I have questions regarding the Trec-Covid datasets. (1) I should retrieve the document in the entire corpus (171K) right? (2) The qrel test.tsv has three labels 0,1,2, when I get the prediction from the BM25 baseline, how should I assign the values to them? all 1 ? Thank you