There are some common train/dev/test splits, often based temporally by weeks. Could be aol/[train|dev|test]. Probably worth having some name for different splits used, so like aol/[split-id]/[train|dev|test]
Supported Entities
[ ] docs -- dataset only has URLs (and mostly top-level domains)
[x] queries
[x] qrels (clicks)
[ ] scoreddocs
[ ] docpairs
Additional comments/concerns/ideas/etc.
How to deal with documents? Some folks only use the document titles (?!?) and filter out ones that do not match in the top BM25 results. What seems to be common is to fetch all clicked documents and use that as a corpus, but that clearly introduces a bunch of biases. Another dataset (clueweb? c4?) could be used as a source of the documents, though I have not seen anybody do it this way before.
Of course, this dataset could always just consist of queries and qrels, and leave it as an exercise for the user to decide how to construct the documents.
Dataset Information:
A lightning rod that may not be worth touching.
OTOH, it's still sometimes used by researchers.
Links to Resources:
Dataset ID(s):
aol
aol/[train|dev|test]
. Probably worth having some name for different splits used, so likeaol/[split-id]/[train|dev|test]
Supported Entities
Additional comments/concerns/ideas/etc.
How to deal with documents? Some folks only use the document titles (?!?) and filter out ones that do not match in the top BM25 results. What seems to be common is to fetch all clicked documents and use that as a corpus, but that clearly introduces a bunch of biases. Another dataset (clueweb? c4?) could be used as a source of the documents, though I have not seen anybody do it this way before.
Of course, this dataset could always just consist of queries and qrels, and leave it as an exercise for the user to decide how to construct the documents.