Open aolko opened 8 months ago
AI Vision is an attempt to provide multimodality by allow the model to recognize and interpret uploaded or generated images. This uses AI Horde or a local A1111 endpoint to perform image interrogation, similar to llava, although not as precise. Click on any image and you can enable it within Lite. This functionality is not provided by KCPP itself.
In the latest version, you can also use a LLaVA mmproj file for vision.
@LostRuins, is it reprocesses all previous images when a new one added?
It even prints Processing LLaVa Embedding 1 (1728 tokens)
again, with the number changing to 2 and 3 then.
…Wait, after I added 5th image (down along the story), it stopped processing them altogether!?
There is a hard limit of 4 images allowed per prompt. It will also fail is the required image tokens exceeds max context.
how to fix this one and how it's supposed to work? The wiki sure is lacking info about it.