embeddings-benchmark / mteb

MTEB: Massive Text Embedding Benchmark
https://arxiv.org/abs/2210.07316
Apache License 2.0
1.95k stars 271 forks source link

Add embeddings speed #108

Open vprelovac opened 1 year ago

vprelovac commented 1 year ago

An important factor in choosing embeddings is the speed of embedding.

I suggest adding a "tab" in the evluation called "Speed" and it would be represented in sentences/sec for example (canalso be tokens/sec).

This is a very useful feature of the SBERT site for example: https://www.sbert.net/docs/pretrained-models/msmarco-v3.html

and efficiency as a parameter is already mentioned in your paper.

Muennighoff commented 1 year ago

It's a great point - unfortunately, since the evaluation is not automatic but each user evaluates individually, it's hard to compare speeds as users may use different CPUs / GPUs / environments. E.g. it's also very hard to fairly compare speed with the APIs, since you cannot run them locally at all.

If you have a good idea for solving this, would love to add it to the leaderboard, but the best I can think of is doing it manually for each model.

vprelovac commented 1 year ago

Oh I see. Well the solution is obviously to make the evaluation automatic. We can donate a GPU to run evaluations (24/7) if you want to make an automated script.

KennethEnevoldsen commented 2 months ago

Note that we have implemented two speed tasks #848 - they simply need to be run on comparable hardware.