allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
309 stars 42 forks source link

TREC Deep Learning 2022 #168

Closed seanmacavaney closed 1 year ago

seanmacavaney commented 2 years ago

Dataset Information:

"The Deep Learning track focuses on IR tasks where a large training set is available, allowing us to compare a variety of retrieval approaches including deep neural networks and strong non-neural approaches, to see what works best in a large-data regime."

Links to Resources:

Dataset ID(s) & supported entities:

Checklist

Mark each task once completed. All should be checked prior to merging a new dataset.

Additional comments/concerns/ideas/etc.

Little information is available yet.

cadurosar commented 1 year ago

Hi Sean, we are integrating the splade repository with ir_datasets and one of the tests we want to add is TREC-DL22 (and potentially the queries from NEUCLIR23 and DL23). Anything that I can help in adding this data (as the issue already exists)?

seanmacavaney commented 1 year ago

Hey @cadurosar! Thanks for bumping this. It should be easy to add, and I'll try to within the next couple of days.

seanmacavaney commented 1 year ago

@cadurosar I've finished the updates for TREC DL 2022, including the /judged subsets, and pushed to pypi (version 0.5.5).

I'll try to get to NeuCLIR later today, but I have a busy schedule today so it may not get done until early next week.

seanmacavaney commented 1 year ago

An important note: the publicly-released qrels propagate the relevance labels to the duplicates, so that's what I use here. So the results you get when using them won't align with the numbers in the notebook papers.

cadurosar commented 1 year ago

@cadurosar I've finished the updates for TREC DL 2022, including the /judged subsets, and pushed to pypi (version 0.5.5).

I'll try to get to NeuCLIR later today, but I have a busy schedule today so it may not get done until early next week.

Thanks a lot Sean! I completely understand and if you find that you are too busy I can try adding them next week, just let me know