Closed legut2 closed 9 months ago
This is a fork of LLaVA, to work with Mistral. Objective is to push the vision LLM even further than what LLaVA team has accomplished. We'll be taking it in a different direction with many ideas in store to extend its capabilities, you're more than welcome to help contribute if you're interested.
I'm curious about this project and what's the motivation. I'm a computer vision geek and wanted to know what you two are toying around with here. What are you trying to have it do?
I'm curious about the multimodal models in general when it comes to images and text.