Getting a looping error during inference reading [control_76][control_76][control_76]...

LostRuins / koboldcpp

Run GGUF models easily with a KoboldAI UI. One File. Zero Install.

GNU Affero General Public License v3.0

5.35k stars 363 forks source link

Describe the Issue All the Mistral-Large GGUF models result in Koboldcpp looping a strange error msg during inference reading [control_76].

Additional Information:

The Koboldcpp build is 1.72 (latest).
I'm using GGUF files too big in HF, so they're split into multiple parts. They're not merged (I don't think that's necessary, yea?)
This same issue happens on two different Mistral-Large-GGUF providers: (Bartowski and lmstudio-community)
My setup is on vast.ai on the official Koboldcpp template running on 4x RTX 3090's (96GB VRAM) using this docker: docker run -e KCPP_MODEL="https://huggingface.co/lmstudio-community/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407-Q4_K_M-00001-of-00002.gguf?download=true, https://huggingface.co/lmstudio-community/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407-Q4_K_M-00002-of-00002.gguf?download=true" -e KCPP_ARGS="--usecublas --gpulayers 999 --contextsize 28000 --multiuser --flashattention" koboldai/koboldcpp:latest
Everything else in Koboldcpp is set to default, except the instruct tag preset is set to Mistral.
No matter what instruct tag preset I do, the issue persists.

split inst default settings

Vast pod hardware details: pod details

Here is the vast log, although there's no errors and nothing looks unusual to me: vast log.txt

Tested it with the following model : https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00001-of-00007.gguf,https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00002-of-00007.gguf,https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00003-of-00007.gguf,https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00004-of-00007.gguf,https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00005-of-00007.gguf,https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00006-of-00007.gguf,https://huggingface.co/MaziyarPanahi/Mistral-Large-Instruct-2407-GGUF/resolve/main/Mistral-Large-Instruct-2407.Q4_K_S.gguf-00007-of-00007.gguf

This works for me on the https://koboldai.org/runpodcpp A100. Runpod and VastAI share the same docker image so I expect this model to also work for you on vast.

LostRuins / koboldcpp

Getting a looping error during inference reading [control_76][control_76][control_76]... #1056