-
For other purposes than KoboldAI (e.g. interfacing with the API directly with my python scripts) it would be nice to have something like an argument in the api call, or a cli flag, to enable or disabl…
-
**Describe the bug**
Multigen seems to be incompatible with the new instruct mode: the first request has its prompt formatted correctly, but all subsequent requests have broken formatting.
Here's …
-
### Describe the bug
Hello,
first off, I'm using Windows with Llama.cpp that has cuBLAS activated.
I've noticed that with newer Ooba versions, the context size of llama is incorrect and arou…
-
On the latest release I use the following command line args: **koboldcpp.exe --threads 6 --stream --host 10.0.0.129**
When the model is loaded, resulting URL outputted to CMD window is: **http://10…
-
There was a performance regression in earlier versions of llama.cpp that I may be hitting with long running interactions. This was recently fixed with the addition of a --no-mmap option which forces t…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [ Y] I am running the latest code. Development is very rapid so there are no tagged versions as o…
-
Hi, why in chat mode, if I say "Hello", koboldcpp make questions and answers conversation in the terminal?
in UI :
KoboldAI
How can I help you?
in Windows terminal :
Output: How can I help …
-
Even though I am using the same sampling parameters as in the llama.cpp repo, the generation output in koboldcpp is significantly worse. It feels like koboldcpp is ignoring the prompt format.
### l…
-
I wanna try the new options like this: koboldcpp.exe --useclblast 0 0 and --smartcontext
Previously when I tried --smartcontext it let me select a model the same way as if I just ran the exe norma…
-
I downloaded a model from hugging face koboldai: and it doesn't seem to work:
System Info: AVX = 0 | AVX2 = 0 | AVX512 = 0 | AVX512_VBMI = 0 | AVX512_VNNI = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F…