allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
314 stars 42 forks source link

.z compression support for robust04 #139

Closed seanmacavaney closed 2 years ago

seanmacavaney commented 2 years ago

In the original version of robust04, the files were encoded using the UNIX compress command (giving .z files). More recent distributions of the dataset use gzip instead, but we can easily support both formats.

fixes #55

seanmacavaney commented 2 years ago

the .z decompression library doesn't support python 3.6. Given that the 3.6 end-of-life is in just a few weeks and this will only affect a single dataset under a rather specific circumstance, I think I'm file pushing this anyway.