-
### What happened?
Chat template formatting seems to be swapped for Mistral and Llama 2.
Llama2 supports the `` token for system messages, while Mistral simply uses newlines.
Starting llama ser…
-
Hey Guys,
This is a great library, but I have a question. Is this library is able to use memory as efficiently as the Llama.cpp library? In otherwords, if I'm using a checkpoint that I use with Llama…
-
It looks like PyPI only has the source distribution for each release: https://pypi.org/project/llama-cpp-python/0.2.6/#files
But the GitHub release at https://github.com/abetlen/llama-cpp-pytho…
-
There has been a completed merge of mamba model support over at Ilama.ccp, would it be possible to implement these into Ollama as well?
Merged PR: https://github.com/ggerganov/llama.cpp/pull/5328
…
-
### What happened?
I used `cmake -B build` to generate a Visual Studio solution. After that, when compiling `test-grammar-integration.cpp` with MSVC, the error "newline in constant" occurred.
Here …
-
I'm trying to implement the low level API into my own program, loading the model(I am using Pygmalion-13B.ggmlv3.Q6_K.gguf) works fine and I get no errors. Now when I try to evaluate the model via lla…
-
### Godot version
v4.2.2
### godot-cpp version
latest
### System information
Windows 11
### Issue description
Can you add some more details on how to build this addon?
It seems like it uses zi…
-
I’ve discovered a performance gap between the Neural Speed Matmul operator and the Llama.cpp operator in the Neural-Speed repository. This issue was identified while running a benchmark with the ONNXR…
-
I ran the command like this:
```bash
bun x humanifyjs local responsez.js
ggml_vulkan: Found 1 Vulkan devices:
Vulkan0: NVIDIA GeForce GTX 1070 (NVIDIA) | uma: 0 | fp16: 0 | warp size: 32
[nod…
-
### ⚠️ This issue respects the following points: ⚠️
- [X] This is a **bug**, not a question or a configuration/webserver/proxy issue.
- [X] This issue is **not** already reported on [Github](https://…