allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
314 stars 42 forks source link

msmarco-passage/dev scoreddocs #141

Closed seanmacavaney closed 2 years ago

seanmacavaney commented 2 years ago

Describe the bug

The scoreddocs of msmarco-passage/dev only appear to contain records for queries found in msmarco-passage/dev/small. Is there a more complete version of the scoreddocs for the full dev set, or do scoreddocs only apply to the small version?

Affected dataset(s)

Expected behavior

If the scoreddocs only apply to msmarco-passage/dev/small, they should only be present for this subset. If there's a larger version available, they should be available instead via msmarco-passage/dev (and msmarco-passage/dev/judged).

seanmacavaney commented 2 years ago

Confirmed that the original scoreddocs are for the small set: https://github.com/microsoft/MSMARCO-Passage-Ranking#top1000

The same applies to msmarco-passage/eval (should be moved to msmarco-passage/eval/small).