embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.95k stars 273 forks source link

Problems with mteb_meta for german evaluation #847

Closed achibb closed 1 week ago

achibb commented 5 months ago

Hi everyone, I am having troubles with generating the mteb_meta for German, with just running the script.

I am currently trying to format results but it does not seem to work straight away with "mteb_meta.py" - any idea? I just get a blank metadata file:

tags:

mteb model-index: name: gbert-large

results:

it gives this for every dataset:

WARNING:mteb.evaluation.MTEB:Passing task names as strings is deprecated and will be removed in the next release. Please use tasks = mteb.get_tasks(tasks=[...]) method to get tasks instead.
INFO:main:Skipping AmazonCounterfactualClassification as split test not present.
WARNING:mteb.evaluation.MTEB:Passing task names as strings is deprecated and will be removed in the next release. Please use tasks = mteb.get_tasks(tasks=[...]) method to get tasks instead.
INFO:main:Skipping AmazonReviewsClassification as split test not present.

Do I need to modify something on the code?

imenelydiaker commented 5 months ago

Due to recent updates #826 and #806:

For the WARNING message, if you're using a python script then your code should look like this:

import mteb
from sentence_transformers import SentenceTransformer

# Define the sentence-transformers model name
model_name = "average_word_embeddings_komninos"
# or directly from huggingface:
# model_name = "sentence-transformers/all-MiniLM-L6-v2"

model = SentenceTransformer(model_name)
tasks = mteb.get_tasks(tasks=["Banking77Classification"])
evaluation = mteb.MTEB(tasks=tasks)
results = evaluation.run(model, output_folder=f"results/{model_name}")
imenelydiaker commented 5 months ago

Else for these 2 messages:

INFO:main:Skipping AmazonCounterfactualClassification as split test not present. INFO:main:Skipping AmazonReviewsClassification as split test not present.

It's likely a bug on our side, we'll check, thank you for reporting!

KennethEnevoldsen commented 5 months ago

@imenelydiaker I believe it is due to the new results format introduced in #759. mteb_meta.py will need to be rewritten for the new format.

We should probably make it a CLI with a test (otherwise it is impossible to know if it breaks).

KennethEnevoldsen commented 5 months ago

@achibb, we have currently updated the CLI and well as the benchmark lists. I believe the new CLI should suit your purpose

achibb commented 5 months ago

Thank you very much! Will test the next days and feedback.

I was wondering can I also compute something for the German benchmark for other models like mdeberta, and somehow add it to the leaderboard ?

Gesendet von Outlook für iOShttps://aka.ms/o0ukef


Von: Kenneth Enevoldsen @.> Gesendet: Tuesday, June 11, 2024 1:33:36 PM An: embeddings-benchmark/mteb @.> Cc: achibb @.>; Mention @.> Betreff: Re: [embeddings-benchmark/mteb] Problems with mteb_meta for german evaluation (Issue #847)

@achibbhttps://github.com/achibb, we have currently updated the CLI and well as the benchmark lists. I believe the new CLI should suit your purpose

— Reply to this email directly, view it on GitHubhttps://github.com/embeddings-benchmark/mteb/issues/847#issuecomment-2160525210, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AKBF2KRCQFGA6NY7WYMTW73ZG3OBBAVCNFSM6AAAAABIPBGHUWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCNRQGUZDKMRRGA. You are receiving this because you were mentioned.Message ID: @.***>

imenelydiaker commented 5 months ago

Thank you very much! Will test the next days and feedback. I was wondering can I also compute something for the German benchmark for other models like mdeberta, and somehow add it to the leaderboard ?

Yes you can evaluate any model and submit results to this repo via PR so they can be added to the leaderboard (check the guide on opening a PR on HF here).

isaac-chung commented 1 week ago

Closing this for now. Feel free to reopen if the issue still persists, or open a new issue.