Open seanmacavaney opened 3 years ago
Hi @seanmacavaney,
I was talking with @daria87 and we would like to integrate the INEX 2004 and 2005 datasets on academic search into ir_datasets.
What would be the best way to start? Are you aware of a similar (i.e., XML oriented) collection already integrated into ir_datasets from which we could get some inspiration (e.g., which XML parser to use)? Also, the INEX 2004/2005 dataset had hierarchical relevance judgments, i.e., on the article, on the section, on the paragraph, but sometimes even on the bibliographical or formula level.
Do you have an idea how one could realize this well? E.g., I could imagine that one could realize it as different "sub corpora", e.g., article retrieval vs. section retrieval vs. paragraph/formula retrieval, but I am unsure if this is an good way to go.
Various datasets from INEX shared tasks: https://inex.mmci.uni-saarland.de/data/documentcollection.html