Closed TheTahaaa closed 1 year ago
qrels
are the relevance assessments. Usually these are from human assessments, but the case of WikIR, they are inferred as follows:
scoreddocs
is a sample ranking of documents from a search engine. This is often used in "re-ranking" subtasks, to ensure that different re-ranking systems start with the same initial set of documents. In the case of WikIR, I believe the authors release their BM25 run for this purpose. The scores in scoreddocs
are the score from the retrieval engine (if available).
If you're building a system that performs retrieval, it's safe to ignore scoreddocs
; they are only useful when doing controlled re-ranking experiments.
Does this help?
qrels
are the relevance assessments. Usually these are from human assessments, but the case of WikIR, they are inferred as follows:
scoreddocs
is a sample ranking of documents from a search engine. This is often used in "re-ranking" subtasks, to ensure that different re-ranking systems start with the same initial set of documents. In the case of WikIR, I believe the authors release their BM25 run for this purpose. The scores inscoreddocs
are the score from the retrieval engine (if available).If you're building a system that performs retrieval, it's safe to ignore
scoreddocs
; they are only useful when doing controlled re-ranking experiments.Does this help?
Certainly! I appreciate your assistance.
No problem!
Dataset(s) WikiIR dataset
Describe the proposed change Hey there, I was working with the WikiIR dataset and found it a bit confusing. I don't understand why the pairs of (query, document) are different in qrels and scoredocs files. In the beginning, I thought the scoredocs file contains the relevancy score of each sample in qrels, but seems that's not the case. The pairs of (query, document) are almost completely different in these two files! Maybe a bit more clarification for these two files help people like me to understand it and don't get confused.
Thanks a bunch!