beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

New Dataset inclusion to BEIR benchmarks #112

Open naveenjafer opened 1 year ago

naveenjafer commented 1 year ago

Hi, I would like to request the inclusion of RELiC dataset in the BEIR benchmark for retrieval. I work with the lab at UMass Amherst from which the Authors of RELiC are and requesting the inclusion of a subset of the dataset on their behalf.

Task Overview

We collect a large-scale dataset (RELiC) of 78K literary quotations and surrounding critical analysis and use it to formulate the novel task of literary evidence retrieval, in which models are given an excerpt of literary analysis surrounding a masked quotation and asked to retrieve the quoted passage from the set of all passages in the work.

Specifications

We make a subset of the test set available to check the zero-shot retrieval capability of retrievers. The test set consists of 5 books formatted in accordance with BEIR. The dataset differs from the complete RELiC dataset in the following ways

  1. The RELiC dataset has a combination of continuous masked quotations of different lengths, ranging from 1-5. We include only masked quotations of 1 in this dataset for retrieval.
  2. By construction, the dataset has both a preceding context and a succeeding context for retrieval. However, the norm with retrievers and benchmarks is to have a single input sentence for retrieval. To accommodate for this, we drop the succeeding contexts and pose RELiC as a retrieval problem using only the preceding context of a maximum of 4 sentences. This can be referred to as the 4/0 setting.

Dataset

The dataset can be found here. At the top level, there are 5 folders, each corresponding to a book from the test set.

LeaderBoard

A public Leader board is available for this task for both the zero-shot and trained settings.