-
Hi, thanks for your amazing work on this software. I am tring to run some of the latest QWEN models that are topping the leader boards and on paper currently the best base model. Specifically QWEN-7…
-
$ ./examples/chat-gpt2.sh
main: build = 480 (f4cef87)
main: seed = 1683650863
llama.cpp: loading model from ./models/ggml-model-gpt2-q4_0.bin
error loading model: missing tok_embeddings.weight
l…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [x] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
I knew this is a very vague description but I repeatedly running into an issue with koboldcpp: Everything runs fine on my system until my story reaches a certain length (about 1000 tokens): Than sudde…
-
Your readme mentions to use the instruct model of DeepSeek-coder but isn't that model specifically not trained on FIM which you mention you are using?
-
The ROPE is supposed to be 1,000,000. The defaults used in KoboldCPP is 10,000. Airoboros 34b repeats with KoboldCPP, unless the proper rope is used.
-
The documentation only mentions NVIDIA GPUs specifically to run models cross multi GPUS. Is it possible to use KoboldCPP with Multi AMD GPUs? Will it work with CLblast ?
-
- [x] I am running the latest code. 794db3e
- [x] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md).
- [x] I [searched using keywords relevant to my …
ghost updated
6 months ago
-
### Feature/Improvement Description
ChatGPT, how can I change this code so that it is compatible and works on all Linux architectures? Including my manjaro system? AGiXT.sh file
```
# Check…
-
From the server side, they look like this:
```
print_timings: prompt eval time = 6960.61 ms / 3323 tokens ( 2.09 ms per token, 477.40 tokens per second) …