LLaVA multimodal projection stops working after New Session

iScriptLex commented 5 months ago

Kubuntu 22.04, GeForce 4090. KoboldCPP version: 1.64 Models: kunoichi-7b.Q8_0.gguf, BuRP_7B-Q8_0-imat.gguf or any other Mistral-based 7B model. Mmproj: mistral-7b-mmproj-v1.5-Q4_1.gguf Command line: ./koboldcpp-linux-x64 --model models/BuRP_7B-Q8_0-imat.gguf --usecublas --gpulayers 10000 --contextsize 4096 --preloadstory startup.json --mmproj mmproj/mistral-7b-mmproj-v1.5-Q4_1.gguf

Add image for recognition (Add Img -> Upload Image File)
Add text "Describe this image" to chat
Click "Generate More".

Model describes image successfully: objects, composition, etc.

After that, click "New Session" (this removes uploaded image and generated text)
Add some other (NOT the same) image for recognition
Add text "Describe this image" to chat
Click "Generate More".

Model talk nonsense that has nothing to do with the uploaded image. All subsequent uploaded images are also not recognized. Image recognition starts working again only after the KoboldCPP is fully restarted.

KoboldCPP 1.61.2 works well and doesn't have this bug.

LostRuins commented 5 months ago

Thanks for reporting, it's a known issue, i'm trying to find the commit that caused it.

Linked: https://github.com/ggerganov/llama.cpp/issues/7060

LostRuins commented 5 months ago

Hi, please try the hotfix 1.64.1 and let me know if that works.

iScriptLex commented 5 months ago

Thank you very much, 1.64.1 works well.

LostRuins commented 5 months ago

Great. We can use my solution until upstream fixes it properly.

LostRuins / koboldcpp

LLaVA multimodal projection stops working after New Session #821