CogVLM is a Python based open source multimodal model. It is significantly better LLaVa, especially at identifying elements on a screen in fact it excels at that part. CogVLM is not too difficult to run and it would improve the experience of running locally. While CogVLM is not supported through ollama please consider adding support for it.
CogVLM is a Python based open source multimodal model. It is significantly better LLaVa, especially at identifying elements on a screen in fact it excels at that part. CogVLM is not too difficult to run and it would improve the experience of running locally. While CogVLM is not supported through ollama please consider adding support for it.