allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
318 stars 42 forks source link

apply SourceDocIter elsewhere #102

Open seanmacavaney opened 3 years ago

seanmacavaney commented 3 years ago

In #101 (C4 + TREC Health Misinformation 2021), I abstracted much of the annoying bits of writing an iterator over document sources into base classes. This should make adding new large datasets considerably easier, with less boilerplate. I should go back and see which prior document collections could be simplified by making use of this.

I believe the datasets that could benefit from this would be: