OpenGenerativeAI / llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
https://huggingface.co/spaces/junior-labs/llm-colosseum
MIT License
1.34k stars 160 forks source link

Feat?: Compute Elo Rankings #70

Closed Jeerhz closed 1 week ago

Jeerhz commented 2 weeks ago

170 simulations to compute elo scores and win rates matrix of 5 multimodal models.