add filter on leaderboard

clembench / clembench-leaderboard

Leaderboard to show the evaluted LLMs

https://huggingface.co/spaces/colab-potsdam/clem-leaderboard

MIT License

1 stars 1 forks source link

add filter on leaderboard #12

Open davidschlangen opened 3 months ago

davidschlangen commented 3 months ago

We could add filters to the leaderboard, similar to what we have for the plots. Could be even more complex, and lead to a re-ordering of the leaderboard.. Basically, could use all parameters that we add to the model registry, like "weighted by number of parameters", or "show me only models where the API responds with a latency of less than.."

... Actually, this requires a bit of thinking. Maybe we only want the full interface internally (which likely would reach the complexity of SQL anyways...), and hard-code a number of criteria of just show some re-rankings automatically? This would make this a feature of the website rather than the leaderboard...

Let's discuss, @sherzod-hakimov , @kushal-10

kushal-10 commented 3 months ago

Sherzod and I discussed it in detail on Wednesday. I am currently working on latency calculations. We discussed on adding criterion like - cheapest model, latency, language support, Number of GPUs/VRAM required (for open models), Parameters, open/commercial. I will create a more comprehensive list of these criteria and have a mock page ready.

This could also look something like what OpenLLM has - https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard. The idea was to add this as a separate HF space, so it can also work as a standalone in the HuggingFace environment and embed this just below the video on the main website

kushal-10 commented 2 months ago

Some possible parameters to consider :

1) Model Specifications

Model Type - Open Source / Commercial
Multimodal Support - multimodal inputs supported (e.g., Image, Audio, Video)
Languages - languages supported by the model
Parameter Size - size of the model in billions of parameters
Clemscore

2) Performance

Latency - Measures the response time in milliseconds (use clembench-runs log time stamps for calculation)
Context Window - Context limit of the model
VRAM Usage - VRAM required for running the model (in GB)
Storage Space - Storage space required to store the model (in GB)
Release Date

3) Cost and Licensing

Cost per 1M Input Tokens
Cost per 1M Output Tokens
Width (px)
Height (px)
Vision Pricing - Pricing information that considers width and height
License - Licensing type for the model - custom, MIT, Apache etc..