embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.8k stars 236 forks source link

How to output the wrong retrieval data? #1162

Closed flower1030 closed 1 week ago

flower1030 commented 4 weeks ago

I have a question. I have used my own datasets to evaluated the models already. But the outcome didn't reach the expectation. So I want to output the wrong retrieval data. How can i achieve it?

KennethEnevoldsen commented 4 weeks ago

I believe @orionw might be the best person to answer this question. We should probably add an example for this in the documentation as well.

orionw commented 4 weeks ago

I have a question. I have used my own datasets to evaluated the models already. But the outcome didn't reach the expectation. So I want to output the wrong retrieval data. How can i achieve it?

Sorry @flower1030, I seem to be having a hard time following the concern.

If I understand correctly, you ran models on your own datasets but the scores are worse than using your custom code. Which models did you run?

And what does “output the wrong retrieval data” mean?

flower1030 commented 4 weeks ago

I ran some models like bge-m3 and nlp_corom_sentence-embedding_chinese-base(This is a custom model) on my own datasets. I am sorry that i don't explain clearly. My question is that the recall metric after testing is not very high, and I want to output the incorrect predictions. To see on which data the model's performance is poor.

orionw commented 4 weeks ago

Ah, you want to save your predictions so you can look at what is going on? For that you can add the flag save_predictions=True to the mteb.run command

I don't think we've added bge-m3 to our model set yet, if you happen to figure it out we'd love a PR to add it to our models here.

flower1030 commented 4 weeks ago

okay, Thank you, I will try. I saw the bge-m3 model from https://huggingface.co/spaces/mteb/leaderboard.