allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
316 stars 42 forks source link

INEX #114

Open seanmacavaney opened 3 years ago

seanmacavaney commented 3 years ago

Various datasets from INEX shared tasks: https://inex.mmci.uni-saarland.de/data/documentcollection.html

mam10eks commented 6 months ago

Hi @seanmacavaney,

I was talking with @daria87 and we would like to integrate the INEX 2004 and 2005 datasets on academic search into ir_datasets.

What would be the best way to start? Are you aware of a similar (i.e., XML oriented) collection already integrated into ir_datasets from which we could get some inspiration (e.g., which XML parser to use)? Also, the INEX 2004/2005 dataset had hierarchical relevance judgments, i.e., on the article, on the section, on the paragraph, but sometimes even on the bibliographical or formula level.

Do you have an idea how one could realize this well? E.g., I could imagine that one could realize it as different "sub corpora", e.g., article retrieval vs. section retrieval vs. paragraph/formula retrieval, but I am unsure if this is an good way to go.