Closed seanmacavaney closed 1 year ago
Hi Sean, we are integrating the splade repository with ir_datasets and one of the tests we want to add is TREC-DL22 (and potentially the queries from NEUCLIR23 and DL23). Anything that I can help in adding this data (as the issue already exists)?
Hey @cadurosar! Thanks for bumping this. It should be easy to add, and I'll try to within the next couple of days.
@cadurosar I've finished the updates for TREC DL 2022, including the /judged
subsets, and pushed to pypi (version 0.5.5).
I'll try to get to NeuCLIR later today, but I have a busy schedule today so it may not get done until early next week.
An important note: the publicly-released qrels propagate the relevance labels to the duplicates, so that's what I use here. So the results you get when using them won't align with the numbers in the notebook papers.
@cadurosar I've finished the updates for TREC DL 2022, including the
/judged
subsets, and pushed to pypi (version 0.5.5).I'll try to get to NeuCLIR later today, but I have a busy schedule today so it may not get done until early next week.
Thanks a lot Sean! I completely understand and if you find that you are too busy I can try adding them next week, just let me know
Dataset Information:
"The Deep Learning track focuses on IR tasks where a large training set is available, allowing us to compare a variety of retrieval approaches including deep neural networks and strong non-neural approaches, to see what works best in a large-data regime."
Links to Resources:
Dataset ID(s) & supported entities:
msmarco-passage-v2
andmsmarco-document-v2
Checklist
Mark each task once completed. All should be checked prior to merging a new dataset.
ir_datasets/datasets/[topid].py
)tests/integration/[topid].py
)ir_datasets generate_metadata
command, should appear inir_datasets/etc/metadata.json
)ir_datasets/etc/[topid].yaml
)ir_datasets/etc/downloads.json
).github/workflows/verify_downloads.yml
). Only one needed pertopid
.downloads.json
.Additional comments/concerns/ideas/etc.
Little information is available yet.