allenai / ir_datasets

Provides a common interface to many IR ranking datasets.
https://ir-datasets.com/
Apache License 2.0
318 stars 42 forks source link

generate bibtex file #35

Closed seanmacavaney closed 3 years ago

seanmacavaney commented 3 years ago

We can leverage the bibtex we have in the documentation to build a master bibtex file. People could import it into their papers and then reference datasets with their ird IDs, e.g., \cite{cord19/trec-covid}.

Will need to standardize the naming of the records (right now it normally uses author, year, and a few words from the title). Can we automatically fill these in by leaving a placeholder?

In cases where there are multiple entries for a given dataset, we can use something like \cite{cord19/trec-covid/1,cord19/trec-covid/2}.

seanmacavaney commented 3 years ago

This plan is a little problematic when multiple datasets have the same citation. Such as for msmarco-document and msmarco-passage. Maybe it's just up to the user to resolve duplicates on their own? We could put a warning comment at the top.

seanmacavaney commented 3 years ago

I think a better way to handle citations is to:

  1. Have a master list of citations (e.g., as a bibtex file). Provide a link to download the full bibtex source. (There's a way to link to an external bibtex file in Overleaf now, I think? The ACL format was doing something like this.)
  2. Reference the IDs of the citations in each dataset's yaml documentation file, not the bibtex source itself
  3. Generate citation examples that include both \cite{x,y} and bibtex versions.