beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.55k stars 186 forks source link

How to evaluate on Trec-Covid #43

Closed luomancs closed 2 years ago

luomancs commented 2 years ago

Hi, I have questions regarding the Trec-Covid datasets. (1) I should retrieve the document in the entire corpus (171K) right? (2) The qrel test.tsv has three labels 0,1,2, when I get the prediction from the BM25 baseline, how should I assign the values to them? all 1 ? Thank you

thakur-nandan commented 2 years ago

Hi @luomancs, (1) Yes, you should retrieve the document from the entire corpus. (2) When you get predictions from the model, a higher score denotes a higher rank. So, my advice would be to use the BM25 scores as it is.

Hope it helps!

Kind Regards, Nandan Thakur

luomancs commented 2 years ago

Hi Nandan,

Thanks for the response, as you suggested, I retrieve relevant document from the entire corpus and assign the bm25 score to each retrieved document. And I used the pytrec_eval to evaluate bm25 results, however, I got much lower ndcg@10 score as you had in the paper. Specifically, my script looks like follows, I wonder should I do any processing of the "qrel" which is the test.tsv that is given.

import pytrec_eval import json

% from qrels/test.tsv, where each q_i has more than 10 documents from test.tsv qrel = { 'q1': { 'd1': 0, 'd2': 1, 'd3': 2, 'd4':0 }, 'q2': { 'd2': 1, 'd3': 1, }, }

% prediction from bm25, where each q_i has 10 relevant documents and the score is assigned by bm25.

run = { 'q1': { 'd4': 1.0, 'd10': 0.0, 'd9': 1.5, }, 'q2': { 'd1': 1.5, 'd6': 0.2, } }

evaluator = pytrec_eval.RelevanceEvaluator( qrel, {'ndcg'})

ndcg = 0 for i, item in evaluator.evaluate(run).items(): ndcg += item['ndcg'] print(round(ndcg/len(qrel),3))

thakur-nandan commented 2 years ago

Hi @luomancs,

Use this easy example for BM25 scores, it should work automatically for you! https://github.com/UKPLab/beir/blob/main/examples/retrieval/evaluation/lexical/evaluate_bm25.py

Only replace L33 in the example as dataset = 'trec-covid'

luomancs commented 2 years ago

Hi Nandan, got it! Thank you !