-
llama.cpp running in server mode, how to use this? any documentation on usage?
-
So with
```
tabby_x86_64-manylinux2014-cuda122/llama-server -m /home/mte90/.tabby/models/TabbyML/StarCoder2-7B/ggml/model-00001-of-00001.gguf --cont-batching --port 30890 -np 1 --log-disable --ctx-…
-
When I quantified the Qwen2.5-1.5B-instruct model according to "GGUF Export" in the examples.md in the docs, it showed that the quantization was complete and I obtained the gguf model.But when I load …
-
### What happened?
llama.cpp is running slow on NVIDIA A100 80GB GPU
Steps to reproduce:
1. git clone https://github.com/ggerganov/llama.cpp && cd llama.cpp
2. mkdir build && cd build
3. cmak…
-
```swift
.package(url: "https://github.com/ggerganov/llama.cpp.git", branch: "master")
```
It is better to use a special version of the package, because master is constantly updated and different r…
-
Need to experiment with this solutions and decide if lora can be implemented on CPU
- [x] unsloth
- [x] axolotl
- [x] llama factory
- [x] text-gen-ui *
-
### Request Description
Llama.cpp is a very popular and excellent LLM/VLM inference deployment framework, implemented in pure C/C++, without any dependencies, and cross-platform. Based on SYCL and Vu…
-
When I quantified the Qwen2.5-1.5B-instruct model according to **"Quantizing the GGUF with AWQ Scale"** of [docs](https://qwen.readthedocs.io/en/latest/quantization/llama.cpp.html) , it showed that th…
-
### What happened?
I ran` ./llama-gbnf-validator mygrammar.txt mytestprogram.txt `and after checking the grammar itself, it started to parse the test file and it went into an infinite loop calling st…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### Reproduction
Hi There, I am observing a difference in output between llama factory inference and llama.cpp.
I am…