OpenGenerativeAI / llm-colosseum

Benchmark LLMs by fighting in Street Fighter 3! The new way to evaluate the quality of an LLM
https://huggingface.co/spaces/junior-labs/llm-colosseum
MIT License
1.34k stars 160 forks source link

Feat: Add Vision Model to the Game #69

Closed Jeerhz closed 2 weeks ago

Jeerhz commented 2 weeks ago

Summary

This update introduces support for selecting the ollama:llava multimodal model within the game, enhancing the player experience with visual analysis capabilities. It poses the bases for adding subsequent multimodal models.

Previous Functionality

Previously, the game used classic text models, generating a text-based context prompt from game data algorithmically. This approach focused solely on textual inputs for determining moves.

New Functionality

With this update, players can now load a multimodal model. The ollama:llava model directly analyzes in-game images, producing a list of moves based on visual input.