Open vprelovac opened 1 year ago
It's a great point - unfortunately, since the evaluation is not automatic but each user evaluates individually, it's hard to compare speeds as users may use different CPUs / GPUs / environments. E.g. it's also very hard to fairly compare speed with the APIs, since you cannot run them locally at all.
If you have a good idea for solving this, would love to add it to the leaderboard, but the best I can think of is doing it manually for each model.
Oh I see. Well the solution is obviously to make the evaluation automatic. We can donate a GPU to run evaluations (24/7) if you want to make an automated script.
Note that we have implemented two speed tasks #848 - they simply need to be run on comparable hardware.
An important factor in choosing embeddings is the speed of embedding.
I suggest adding a "tab" in the evluation called "Speed" and it would be represented in sentences/sec for example (canalso be tokens/sec).
This is a very useful feature of the SBERT site for example: https://www.sbert.net/docs/pretrained-models/msmarco-v3.html
and efficiency as a parameter is already mentioned in your paper.