allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
306 stars 40 forks source link

detect google colab and use gsutil (for NQ) #261

Closed cmacdonald closed 2 months ago

cmacdonald commented 2 months ago

This makes the use of the NQ corpus faster on Google Colab. Most of the time is taking in building the docstore, but downloading still takes significant time.

The difference is 5MB/sec vs 21MB/sec.