beir-cellar / beir

A Heterogeneous Benchmark for Information Retrieval. Easy to use, evaluate your models across 15+ diverse IR datasets.
http://beir.ai
Apache License 2.0
1.49k stars 177 forks source link

`retriever.retrieve(corpus, queries)` with `len(queries)==1` errors #170

Open manestay opened 3 months ago

manestay commented 3 months ago

I have a somewhat silly use case. I'm running retriever.retrieve, with a queries dict with only 1 entry. However, this causes an IndexError with pytorch due to how pytorch indexes 2D arrays where the first dim is of size 1:

File "<dir>/script.py", line 83, in <module>
    results = retriever.retrieve(para_d, one_query_d)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<env>/beir/retrieval/evaluation.py", line 20, in retrieve
    return self.retriever.search(corpus, queries, self.top_k, self.score_function, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<env>/beir/retrieval/search/dense/exact_search.py", line 73, in search
    scores_dim = cos_scores[1]
                 ~~~~~~~~~~^^^
IndexError: index 1 is out of bounds for dimension 0 with size 1

I was able to fix this by changing the below line: https://github.com/beir-cellar/beir/blob/f062f038c4bfd19a8ca942a9910b1e0d218759d4/beir/retrieval/search/dense/exact_search.py#L73

And swapping out cos_scores[1] with scores_dim:

scores_dim = cos_scores[1] if cos_scores.shape[0] != 1 else cos_scores[0]

This monkey patch works for me, but I'm just curious if there's a more appropriate way to retrieve given only 1 query. Thanks!