embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.84k stars 247 forks source link

QuoraRetrieval Task Issue #1138

Closed MythicalCow closed 1 month ago

MythicalCow commented 1 month ago

QuoraRetrieval seems to have been running properly until today. Some issue with the metadata block: "Repo card metadata block was not found. Setting CardData to empty"

KennethEnevoldsen commented 1 month ago

@MythicalCow, there shouldn't have been made any changes to the dataset "QuoraRetrieval" and running the code:

import mteb
tasks = mteb.get_tasks(tasks=["QuoraRetrieval"])
task = tasks[0]
task.load_data()

work without any issues. It might be that you need to reset your huggingface cache? It is pertains, do add an example then I would love to take a second look at it

MythicalCow commented 1 month ago

@KennethEnevoldsen Here is the code snippet

evaluation = MTEB(tasks=["QuoraRetrieval"], languages=["eng"])
eval_split = "test"
results = evaluation.run(l1, output_folder=f"results/large1_{eval_split}", eval_splits=[eval_split])
for result in results:
   print(result.scores[eval_split][0]['main_score'])
MythicalCow commented 1 month ago

I will try the huggingface cache reset as well. Thanks for the suggestion!

MythicalCow commented 1 month ago

One interesting thing is that other datasets like DBPedia seem to be working

KennethEnevoldsen commented 1 month ago

rewrote the code for testing and I don't seem to be able to reproduce your error:

tasks = mteb.get_tasks(tasks=["QuoraRetrieval"], languages=["eng"]) # note the change: recommended syntax
model = mteb.get_model("sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2")
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"tmp", eval_splits=["test"])
results
# [MTEBResults(task_name=QuoraRetrieval, scores=...)]
isaac-chung commented 1 month ago

Feel free to reopen this if this still persists.