allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
318 stars 42 forks source link

provide mulitple use examples #73

Closed cmacdonald closed 3 years ago

cmacdonald commented 3 years ago

Describe the proposed change

image

Could this block have tabs, and others platforms could be shown too?

seanmacavaney commented 3 years ago

Great idea! I added PyTerrier examples for:

There's several caveats for these auto-generated examples:

  1. Some doc fields do not make much sense to index. I filtered down to only text fields, and I removed doc_id and a few other fields I spotted that could be particularly problematic. But more attention may be needed on these in the future.
  2. I do not include examples for non-English docs or non-English queries
  3. Some datasets, like the ClueWebs and GOV2 currently need a wrapper to extract the HTML content. This will be improved in #72. Might want to re-visit (1) at that time as well.
  4. I did not include examples for scoreddocs and docpairs for pyterrier, as these are not directly supported at this time.

The interface changes all example tabs and remembers your selection across pages. Needed to do a bit of work to keep the page from jumping around too much when the user changes the example tabs.

I also took the opportunity to give examples for the CLI as well.