-
To apply grammar to chat completion, it looks like the llamafile server is expecting the argument `grammar`: https://github.com/Mozilla-Ocho/llamafile/blob/main/llama.cpp/server/server.cpp#L2551
```
…
-
I installed llamacpp using the instructions below:
CMAKE_ARGS="-DLLAMA_CUBLAS=on" pip install llama-cpp-python
the speed:
llama_print_timings: eval time = 81.91 ms / 2 runs ( 40…
-
Hello,
I'm using the following script to fine tune the llama3 model with a custom dataset of questions & responses using the `{'prompt: "", completion:""}` format defined [here](https://github.com/…
-
I've noticed that the GPU utilization is very low during model inference, with a maximum of only 80%, but I want to increase the GPU utilization to 99%. How can I adjust the parameters?
GPU Name …
-
Hello I am a complete noob so I don't know if I have provided enough informations to be helped. but I need help in this please
# Prerequisites
Please answer the following questions for yourself …
adouc updated
11 months ago
-
Use gguf of https://huggingface.co/CohereForAI/aya-23-8B
from https://huggingface.co/bartowski/aya-23-8B-GGUF
```python
import llama_cpp
llm = llama_cpp.Llama.from_pretrained(
repo_id="bart…
-
Installed llama_cpp_python-0.2.43.tar.gz via
CMAKE_ARGS="-DLLAMA_CUBLAS=ON -DCMAKE_CUDA_COMPILER=/opt/cuda/bin/nvcc -DTCNN_CUDA_ARCHITECTURES=61" pip install llama-cpp-python
llm = Llama(model_pat…
-
I installed ```llama-cpp-python``` on the system with:
**CPU AMD EPYC 7542**
**GPU V100**
But it raised the exception shown in the image below:
-
### What is the issue?
`(.venv) [root@bastion ollama]# python llm/llama.cpp/convert-hf-to-gguf.py ./model --outtype f16 --outfile converted.bin
INFO:hf-to-gguf:Loading model: model
INFO:gguf.gguf_…
-
Modifying 2 files is all you need.
pyproject.toml
llama-cpp-python = "^0.2.11" to llama-cpp-python = "^0.2.23"
poetry.lock
Search llama-cpp-python and update 2 values
version = "0.2.23"
…