embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.83k stars 246 forks source link

There are multiple 'mteb/arguana' configurations in the cache: queries, default, corpus Please specify which configuration to reload from the cache, e.g. #1133

Closed BtlWolf closed 3 weeks ago

BtlWolf commented 1 month ago

I encountered an error while performing the retrieval task. 截图20240802213611 This is an example code, where is the problem? image

KennethEnevoldsen commented 1 month ago

@BtlWolf When making issues with open-source code, it is important not to disrespect the time that we put into this (which we do for free). I say this not to be mean, but to help you with future issues.

A few key things to consider:

You might also check up on what you have done to make sure that the problem lies with the package where you created the issue. For instance, it seems like the error is raised by the datasets package here.

BtlWolf commented 1 month ago

@BtlWolf When making issues with open-source code, it is important not to disrespect the time that we put into this (which we do for free). I say this not to be mean, but to help you with future issues.

A few key things to consider:

  • Don't use screenshots when you create an issue unless required. It makes the work harder for people trying to solve your problem.
  • Describe what you have attempted to do to examine the problem. Phrases like "Where is the problem?" seem to suggest that we provide a service that we do not.

You might also check up on what you have done to make sure that the problem lies with the package where you created the issue. For instance, it seems like the error is raised by the datasets package here.

I'm sorry, I'm a bit impatient because of this issue. I will check it first. Thank you for your suggestion!

BtlWolf commented 1 month ago

In details ,my code is: from mteb import MTEB from sentence_transformers import SentenceTransformer

model_name = "average_word_embeddings_komninos"

model = SentenceTransformer(model_name)

evaluation = MTEB(tasks=["ArguAna"]) results = evaluation.run(model, output_folder=f"/data1/jiyifan/OpenMatch/results/test")

error: File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 422, in run raise e File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/mteb/evaluation/MTEB.py", line 352, in run task.load_data(eval_splits=task_eval_splits, kwargs) File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/mteb/abstasks/AbsTaskRetrieval.py", line 231, in load_data corpus, queries, qrels = HFDataLoader( File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/mteb/abstasks/AbsTaskRetrieval.py", line 96, in load self._load_qrels(split) File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/mteb/abstasks/AbsTaskRetrieval.py", line 175, in _load_qrels qrels_ds = load_dataset( File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/datasets/load.py", line 2594, in load_dataset builder_instance = load_dataset_builder( File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/datasets/load.py", line 2303, in load_dataset_builder builder_instance: DatasetBuilder = builder_cls( File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/datasets/packaged_modules/cache/cache.py", line 140, in init config_name, version, hash = _find_hash_in_cache( File "/data1/jiyifan/anaconda3/envs/om1/lib/python3.10/site-packages/datasets/packaged_modules/cache/cache.py", line 85, in _find_hash_in_cache raise ValueError( ValueError: There are multiple 'mteb/arguana' configurations in the cache: queries, default, corpus Please specify which configuration to reload from the cache, e.g. load_dataset('mteb/arguana', 'queries')**

It seems that there is an error in the dataset, but I did not make any settings and only ran the example code

KennethEnevoldsen commented 1 month ago

Loading the data using the latest version does not lead to a problem:

import mteb
tasks = mteb.get_tasks(tasks=["ArguAna"])
task = tasks[0]
task.load_data()

neither does running a model on the task:

model = mteb.get_model("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
evaluation = mteb.MTEB(tasks=tasks)
evaluation.run(model, output_folder="tmp", verbosity=2, overwrite_results=True)
# or using a sentence transformer
from sentence_transformers import SentenceTransformer
model = SentenceTransformer("average_word_embeddings_komninos")
evaluation.run(model, output_folder="tmp", verbosity=2, overwrite_results=True)

I used mteb v1.12.92 and version 2.20.0 of datasets. Can I ask you to repeat it using the latest version on mteb and stating your datasets version.

Reading the error log it seem like the error happens here:

mteb/abstasks/AbsTaskRetrieval.py", line 175, in _load_qrels

My guess is that it is either due to an update of datasets or a mteb version referring to a (now outdated) version of the arguana dataset.

KennethEnevoldsen commented 3 weeks ago

Will close this issue for now - let me know if there is still any issues