Georgetown-IR-Lab / cedr

Code for CEDR: Contextualized Embeddings for Document Ranking, accepted at SIGIR 2019.
MIT License
155 stars 28 forks source link

How to get WebTrack 2012-2014 datasets. #35

Open haiahaiah opened 3 years ago

haiahaiah commented 3 years ago

Hi, I'm confused about how to get WebTrack 2012-2014 datasets. I would appreciate it if you could provide me with the specific process. Thanks a lot.

seanmacavaney commented 3 years ago

Information on obtaining the two ClueWeb collections are found here:

They are purchased from CMU and sent on hard drives. Unfortunately, they cannot be distributed by other means, from my understanding.

The WebTrack queries and qrels are from TREC, and can be found here: https://trec.nist.gov/data/webmain.html

Does this help?