Why need check corpus_id != query_id in DenseRetrievalExactSearch.search()

beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.

Apache License 2.0

1.57k stars 191 forks source link

Why do we need this line to check corpus_id != query_id

for a query with id_q, the corpus with the same id id_q does not mean it is the positive corpus for it. So why do we need to avoid corpus_id == query_id

            for query_itr in range(len(query_embeddings)):
                query_id = query_ids[query_itr]                  
                for sub_corpus_id, score in zip(cos_scores_top_k_idx[query_itr], cos_scores_top_k_values[query_itr]):
                    corpus_id = corpus_ids[corpus_start_idx+sub_corpus_id]
                    if corpus_id != query_id:
                        if len(result_heaps[query_id]) < top_k:
                            # Push item on the heap
                            heapq.heappush(result_heaps[query_id], (score, corpus_id))
                        else:
                            # If item is larger than the smallest in the heap, push it on the heap then pop the smallest element
                            heapq.heappushpop(result_heaps[query_id], (score, corpus_id))

        for qid in result_heaps:
            for score, corpus_id in result_heaps[qid]:
                self.results[qid][corpus_id] = score

        return self.results

beir-cellar / beir

Why need check corpus_id != query_id in DenseRetrievalExactSearch.search() #169