cant get image generation or llava (multimodal) working

LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI

https://github.com/lostruins/koboldcpp

GNU Affero General Public License v3.0

4.41k stars 319 forks source link

cant get image generation or llava (multimodal) working #765

Open chriswal opened 3 months ago

chriswal commented 3 months ago

can you please add the models you are using for testing multimodal and image generation (name and where to find). i tried different models, and i cant get it to work. even if i get the vision model to load the llm outputs garbage if an image is present. if i delete the image the output is ok again. and with the image generation ..the model loads but the image generated is all black. it would be a good starting point to know a known good configuration.

found your image generation model on hugginface ...so this works now

i could get the multimodal working with mistral .. but it works only for the first picture. if you add a second one it gets all mixed up.

LostRuins commented 3 months ago

Yeah unfortunately the model does get a bit confused when there are multiple images, it mixes up details between them.

LLAMA2 Tiefighter seems to perform better than mistral at multimodal.

Good sample models can be found on the wiki https://github.com/LostRuins/koboldcpp/wiki and common projectors here: https://huggingface.co/koboldcpp/mmproj/tree/main

zazer0 commented 2 weeks ago

@LostRuins could you possibly point me to a sample launch command/model config that is known to work? really struggling to find even one vision example to model mine off.

LostRuins commented 2 weeks ago

You can use the official Koboldcpp colab as an example, that one is definitely known to work: https://colab.research.google.com/github/LostRuins/koboldcpp/blob/concedo/colab.ipynb

just enable LoadLLaVAmmproj, that uses the Tiefighter 13B model along with llama-13b-mmproj-v1.5 Assuming you downloaded the right files, launch command for something similar would be python koboldcpp.py --usecublas --gpulayers 20 --model LLaMA2-13B-Tiefighter.Q4_K_S.gguf --mmproj llama-13b-mmproj-v1.5.gguf