Inconsistency in Arguana

First let me thank you for the huge work putting this benchmark together.

While downloading and processing the dataset I came accross something weird in the Arguana Dataset

The id test-free-speech-debate-yfsdfkhbwu-con03b is considered a relevant passage in the qrels/test.tsv file. But this id is not present in the corpus.jsonl file.

In the pytrec eval tool used, the tool checks whether the query id is present and if not log something to tell us but it's not the case for passage id. Thus I think this qrel line will be valid but will never be satisfied during evaluation since the passage id is not in the corpus. Is that a normal behavior ? or should it be filtered in the beir original dataset ?

Thanks, Remi

beir-cellar / beir

Inconsistency in Arguana #101