LostRuins / koboldcpp

A simple one-file way to run various GGML and GGUF models with a KoboldAI UI
https://github.com/lostruins/koboldcpp
GNU Affero General Public License v3.0
4.41k stars 319 forks source link

AI vision stuck on "analyzing..." #731

Open aolko opened 4 months ago

aolko commented 4 months ago

how to fix this one and how it's supposed to work? The wiki sure is lacking info about it.

LostRuins commented 4 months ago

AI Vision is an attempt to provide multimodality by allow the model to recognize and interpret uploaded or generated images. This uses AI Horde or a local A1111 endpoint to perform image interrogation, similar to llava, although not as precise. Click on any image and you can enable it within Lite. This functionality is not provided by KCPP itself.

LostRuins commented 4 months ago

In the latest version, you can also use a LLaVA mmproj file for vision.

aleksusklim commented 4 months ago

@LostRuins, is it reprocesses all previous images when a new one added? It even prints Processing LLaVa Embedding 1 (1728 tokens) again, with the number changing to 2 and 3 then.

…Wait, after I added 5th image (down along the story), it stopped processing them altogether!?

LostRuins commented 3 months ago

There is a hard limit of 4 images allowed per prompt. It will also fail is the required image tokens exceeds max context.