allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
309 stars 42 forks source link

improved HTML/XML parser, TREC 7 and 8 #173

Closed seanmacavaney closed 2 years ago

seanmacavaney commented 2 years ago

Includes deprecating trec-robust04 in favour of using the new version under disks45/nocr (which uses better parser, properly organised under corpus)

Also fixes #160