Feat: Add Vision Model to the Game

Summary

This update introduces support for selecting the ollama:llava multimodal model within the game, enhancing the player experience with visual analysis capabilities. It poses the bases for adding subsequent multimodal models.

Previous Functionality

Previously, the game used classic text models, generating a text-based context prompt from game data algorithmically. This approach focused solely on textual inputs for determining moves.

New Functionality

With this update, players can now load a multimodal model. The ollama:llava model directly analyzes in-game images, producing a list of moves based on visual input.

OpenGenerativeAI / llm-colosseum