allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
318 stars 42 forks source link

documentation for integrations #100

Open seanmacavaney opened 3 years ago

seanmacavaney commented 3 years ago

Describe the proposed change

There's a growing number of integrations. Most recently Datamaestro (see #99)! We should document them, give a little promotion for each one, and provide instructions and/or code samples on how to use it in each tool. Similar to what's in the paper, but more detailed and accessible. Probably both in the README and have a dedicated documentation page for them? (Or a dedicated documentation page for each one?)

So far, these are the ones I'm aware of:

seanmacavaney commented 3 years ago

Would also be nice to provide a guide for using it on Google Colab -- especially in settings where you want the data to persist.

Should be as simple as doing:

from google.colab import drive
drive.mount('/content/drive')
import os
os.environ['IR_DATASETS_HOME'] = "/content/drive/MyDrive/ir_datasets"

Before importing ir_datasets. But there may be other subtleties, such as needing to flush the changes to drive?

seanmacavaney commented 3 years ago

Maybe we could also provide a drive that contains the public datasets that people could mount themselves, to avoid downloading?