allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
306 stars 40 forks source link

A question about WikiIR dataset #227

Closed TheTahaaa closed 1 year ago

TheTahaaa commented 1 year ago

Dataset(s) WikiIR dataset

Describe the proposed change Hey there, I was working with the WikiIR dataset and found it a bit confusing. I don't understand why the pairs of (query, document) are different in qrels and scoredocs files. In the beginning, I thought the scoredocs file contains the relevancy score of each sample in qrels, but seems that's not the case. The pairs of (query, document) are almost completely different in these two files! Maybe a bit more clarification for these two files help people like me to understand it and don't get confused.

Thanks a bunch!

seanmacavaney commented 1 year ago

qrels are the relevance assessments. Usually these are from human assessments, but the case of WikIR, they are inferred as follows:

Screen Shot 2023-03-25 at 13 23 39

scoreddocs is a sample ranking of documents from a search engine. This is often used in "re-ranking" subtasks, to ensure that different re-ranking systems start with the same initial set of documents. In the case of WikIR, I believe the authors release their BM25 run for this purpose. The scores in scoreddocs are the score from the retrieval engine (if available).

If you're building a system that performs retrieval, it's safe to ignore scoreddocs; they are only useful when doing controlled re-ranking experiments.

Does this help?

TheTahaaa commented 1 year ago

qrels are the relevance assessments. Usually these are from human assessments, but the case of WikIR, they are inferred as follows:

Screen Shot 2023-03-25 at 13 23 39

scoreddocs is a sample ranking of documents from a search engine. This is often used in "re-ranking" subtasks, to ensure that different re-ranking systems start with the same initial set of documents. In the case of WikIR, I believe the authors release their BM25 run for this purpose. The scores in scoreddocs are the score from the retrieval engine (if available).

If you're building a system that performs retrieval, it's safe to ignore scoreddocs; they are only useful when doing controlled re-ranking experiments.

Does this help?

Certainly! I appreciate your assistance.

seanmacavaney commented 1 year ago

No problem!