TREC 2023 Tip-of-the-Tongue

mam10eks commented 1 year ago

Dataset Information:

The training and dev data of the TREC 2023 Tip-of-the-Tongue track are now available: https://trec-tot.github.io/guidelines

Description from the website:

Tip of the tongue: The phenomenon of failing to retrieve something from memory, combined with partial recall and the feeling that retrieval is imminent

In terms of input and output, the movie identification task is relatively straightforward—given an input TOT request, output a ranked list of movies. Each movie must be identified by its Wikipedia page id and the correct movie should be ranked as high as possible. For each query, runs should return a ranked list of 1000 Wikipedia page ids. Runs will be evaluated using IR metrics that are appropriate for IR tasks with one relevant document, such as discounted cumulative gain, reciprocal rank, and success@k.

Dataset ID(s) & supported entities:

tip-of-the-tongue/train
tip-of-the-tongue/dev
tip-of-the-tongue/test (not yet released)

Checklist

Mark each task once completed. All should be checked prior to merging a new dataset.

[x] Dataset definition (in ir_datasets/datasets/[topid].py)
[x] Tests (in tests/integration/[topid].py)
[x] Metadata generated (using ir_datasets generate_metadata command, should appear in ir_datasets/etc/metadata.json)
[x] Documentation (in ir_datasets/etc/[topid].yaml)
- [ ] Documentation generated in https://github.com/seanmacavaney/ir-datasets.com/
[x] Downloadable content (in ir_datasets/etc/downloads.json)
- [ ] Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid.
- [x] Any small public files from NIST (or other potentially troublesome files) mirrored in https://github.com/seanmacavaney/irds-mirror/. Mirrored status properly reflected in downloads.json.

Additional comments/concerns/ideas/etc.

mam10eks commented 1 year ago

I would like to implement this ticket.

mam10eks commented 1 year ago

cc @samarthbhargav

mam10eks commented 1 year ago

Dear all, I now had the time to implement this in this branch: https://github.com/mam10eks/ir_datasets/tree/trec-tip-of-the-tongue

Basically, everything is resolved, but I forgot how to do these two steps:

"Documentation generated in https://github.com/seanmacavaney/ir-datasets.com/", and
"Download verification action (in .github/workflows/verify_downloads.yml). Only one needed per topid"

Otherwise, everything seems to be ready.

@seanmacavaney I forgot, was there some documentation on how to do those two steps?

allenai / ir_datasets

TREC 2023 Tip-of-the-Tongue #235