beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.54k stars 182 forks source link

add option ignore_identical_ids in evaluation, important for Pyserini (+CE) on ArguAna #91

Closed kwang2049 closed 2 years ago

kwang2049 commented 2 years ago

By setting this new option, identical IDs between queries and documents will be ignored. In my evaluation, I found this will improve Pyserini / Pyserini + CE / DocT5Query (Pyserini) from 31.5/31.1/34.9 to 41.4/41.7/46.9 (nDCG@10%) on ArguAna. No influence on Quora was found. For dense models + exact search, there would be also no influence, since identical IDs would have already been removed in the exact search step.