Open mengyao00 opened 7 months ago
Hi @mengyao00, thanks for asking the question.
We require this line for two datasets: ArguAna and Quora, where corpus_ids
and query_ids
are similar, i.e., the query is also present within the corpus.
The line is used to avoid the edge case of self-retrieval where the query is self-retrieved at the top-1 position, which reduces the nDCG@10 score for ArguAna and Quora.
Hope it helps!
Regards, Nandan Thakur
Why do we need this line to check corpus_id != query_id
for a query with id_q, the corpus with the same id id_q does not mean it is the positive corpus for it. So why do we need to avoid corpus_id == query_id