OthersideAI / self-operating-computer

A framework to enable multimodal models to operate a computer.
https://www.hyperwriteai.com/self-operating-computer
MIT License
8.21k stars 1.09k forks source link

Add support for LLaVA through Ollama #152

Closed michaelhhogue closed 5 months ago

michaelhhogue commented 5 months ago

What does this PR do?

This PR uses the new Ollama Python module to add support for locally hosted vision models. llava will be the first supported local model through Ollama.

Instructions were also added to the README for how to get up and running with LLaVA through Ollama.

Type of change

Important

Accuracy when using LLaVA is extremely low. However, this PR serves as a starting point for developers to build off of as local vision models improve.

joshbickett commented 5 months ago

Just did initial pass. I can see it working, but see what you mention about higher error rates. Not a problem though, this is great progress. I need to get back to some other tasks, but I'll take a final look later and merge it in. @michaelhhogue

joshbickett commented 5 months ago

@michaelhhogue merged, great job! I'm going to play around with improving the prompts and see if I can get LLaVa to perform better. Either way I'll do a Twitter post about it this week or next and mention you, thanks again!

michaelhhogue commented 5 months ago

@joshbickett Sounds good. Let me know if you notice anything that can be improved with the llava support.