Open aleksusklim opened 1 week ago
I did not do the "replacing history" step, and frankly, I'm not sure if that's supported by Kobold.
Again. I put prompt "A" and get output. It is the same no matter how many times I'll retry it. Then I put prompt "B" and get a different output. From now on, prompt "A" gives something completely different.
This is clearly a bug, but for now I cannot say exactly where (model, quantization algo, upstream, a particular BLAS library, or the koboldcpp itself).
Model: https://huggingface.co/bartowski/DeepSeek-Coder-V2-Lite-Instruct-GGUF I tested quant Q5_K_M
At the very default koboldcpp_cu12 v1.68, CuBLAS with 0 layers, no flash attention.
Prompt: (pasted to the history, sent with empty input box in a new private tab)
Response: (with top_k=1)
Then, replace the entire history with: (example from https://huggingface.co/deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct)
Generation gives:
(With a real config including offloaded layers and larger context, it was even worse like
atorg类似的experimental试用 articulate desigagine练习晃经验的遗 Progress stillestring mall endSun loops nicotine电源 Medalla ?>">litsселението bateria
)Then, put the first prompt back and try to generate "test" again. This is what happens:
(In my real configuration it was
Assistant: shot airflow SSL blah'',工程建设incorpor PAM Богpartially recently hasnViceref comarques Router resposta casualties organitz cyclhement对他WHM us herramientpregunta红色的 altered Cretigor
)Why, what happened? It broke until restarted completely!
I've downloaded
llama-b3184-bin-win-cuda-cu12.2.0-x64
to test the upstream withllama-server.exe
The same sequence at their defaults every time responds correctly with:for i in range(1, len(arr)):
Also, when the model is loading, I see strange characters in these lines, not sure whether this is just a visual Unicode bug or something serious in GGUF metadata: