OthersideAI / self-operating-computer

A framework to enable multimodal models to operate a computer.
https://www.hyperwriteai.com/self-operating-computer
MIT License
8.68k stars 1.15k forks source link

CogVLM Support - A better LLaVa #169

Open AMEND09 opened 7 months ago

AMEND09 commented 7 months ago

CogVLM is a Python based open source multimodal model. It is significantly better LLaVa, especially at identifying elements on a screen in fact it excels at that part. CogVLM is not too difficult to run and it would improve the experience of running locally. While CogVLM is not supported through ollama please consider adding support for it.