embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.8k stars 237 forks source link

NQRetrieval evaluation with e5 throws an error #1115

Closed HSILA closed 1 month ago

HSILA commented 1 month ago

Cloning the repo, installing it in editable mode with pip

git clone https://github.com/embeddings-benchmark/mteb.git
cd mteb
python -m venv venv
source ./venv/bin/activate
pip install -e .

And running the following script:

import mteb

model_name = "intfloat/multilingual-e5-small"

tasks = mteb.get_tasks(tasks=["NQ"])

model = mteb.get_model(model_name)
evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="my_result")

Will throw the following error:

ERROR:mteb.evaluation.MTEB:Error while evaluating NQ: SentenceTransformer.encode() got an unexpected keyword argument 'request_qid'
Traceback (most recent call last):
  File "/home/user/mteb/test.py", line 9, in <module>
    evaluation.run(model, output_folder="my_result")
  File "/home/user/mteb/mteb/evaluation/MTEB.py", line 422, in run
    raise e
  File "/home/user/mteb/mteb/evaluation/MTEB.py", line 383, in run
    results, tick, tock = self._run_eval(
                          ^^^^^^^^^^^^^^^
  File "/home/user/mteb/mteb/evaluation/MTEB.py", line 260, in _run_eval
    results = task.evaluate(
              ^^^^^^^^^^^^^^
  File "/home/user/mteb/mteb/abstasks/AbsTaskRetrieval.py", line 286, in evaluate
    scores[hf_subset] = self._evaluate_subset(
                        ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mteb/mteb/abstasks/AbsTaskRetrieval.py", line 295, in _evaluate_subset
    results = retriever(corpus, queries)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mteb/mteb/evaluation/evaluators/RetrievalEvaluator.py", line 493, in __call__
    return self.retriever.search(
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mteb/mteb/evaluation/evaluators/RetrievalEvaluator.py", line 155, in search
    sub_corpus_embeddings = self.model.encode_corpus(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/mteb/mteb/models/e5_models.py", line 162, in encode_corpus
    emb = self.mdl.encode(sentences, batch_size=batch_size, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
TypeError: SentenceTransformer.encode() got an unexpected keyword argument 'request_qid'

OS: Debian 12 Python: 3.11.2 Version: 1.12.88 Commit: 74661c4548f2113b98afdf1c042d9ac62ab4ed71

KennethEnevoldsen commented 1 month ago

@HSILA thanks for making us aware of this. There is currently a PR modifying the e5 model (#1085).

Using that branch I could reproduce the error, which occur during the corpus encoding as the request_qid is passed onto the encode_corpus function. I added a fix for the problem and ~it should merge along with the PR.~ it merged along with the PR.

Edit: will close this PR. Let me know if the issue remains