-
llama.cpp running in server mode, how to use this? any documentation on usage?
-
I followed the documentation to run the llama2-7b model (4-bit quantized) and also ran it on llama.cpp for comparison. I noticed that, except for nt=1, where there was a slight performance improvement…
-
I found that in the benchmark/suite has the output time to first token. However, when I run `python benchmark.py --model meta-llama/Llama-2-7b-hf static --isl 128 --osl 128 --batch 1` an error occurs:…
-
While building a mixed C and Python wheel, I got the following error:
```sh
~ $ pip install -U llama_cpp_python
Requirement already satisfied: llama_cpp_python in c:\users\ךינשגכהד\scoop\apps\pytho…
-
In a multi-turn conversation I see that the combination of llama-cpp-python and llama-cpp-agent is much slower on the second prompt than the python bindings of gpt4all. See the 2 screenshots below. Th…
-
GPU 2*V100
build script:
```shell
python build.py --model_dir /data/models/vicuna-13b-v1.5/vicuna-13b-v1.5/ \
--dtype float16 \
--use_gpt_attention_plugin float1…
-
-
### Background / context
- was originally following install instructions for Mac at https://simonwillison.net/2023/Aug/1/llama-2-mac/ - yeah, I should have spotted that this was an older post....bu…
-
### System Info
- CPU Arch x86
- 4 H100 CPUs
- using commit 6cc5e177ff2fb60b1aab3b03fa0534b5181cf0f1
### Who can help?
@kaiyux @byshiue
### Information
- [ ] The official example scripts
- [X…
-
I run llama cpp python on my new PC which has a built in RTX 3060 with 12GB VRAM
This is my code:
```
from llama_cpp import Llama
llm = Llama(model_path="./wizard-mega-13B.ggmlv3.q4_0.bin", n_ctx=…