-
I'm using docker's latest-aio-gpu-nvidia-cuda-12. I'm using multiple GPUs.
I would like to change the settings of llama-cpp in detail, but which file should I change?
I am modifying aio/gpu-8g/t…
-
### Prerequisites
- [X] I am running the latest code. Mention the version if possible as well.
- [X] I carefully followed the [README.md](https://github.com/ggerganov/llama.cpp/blob/master/README.md)…
-
Hi! I have followed every step in [Run Llama 2 on your own Mac using LLM and Homebrew](https://simonwillison.net/2023/Aug/1/llama-2-mac/), in particular:
```
pipx install llm # python 3.11
llm in…
-
### Motivation
QuaRot is out https://arxiv.org/abs/2404.00456 for three weeks. Preliminary results are convincing. Also see discussions in `llama.cpp` with the QuaRot authors. It would be amazing to …
-
'python convert-lora-to-ggml.py my-model' won't work because there is no file 'convert-lora-to-ggml.py' in llama.cpp folder any more.
-
I have a local server running an OpenAI compatible API. I simply want all requests that normally go to `api.openai.com:443` go to `localhost:8000`.
I did see that you should be able to [overide mo…
-
I run this on GPU: 2 * A30 with CUDA driver 535.104.12.
The docker image is built using `make -C docker release_build CUDA_ARCHS="80-real"`
I use the latest code in branch main.
```
commit 89ba1…
-
Pulled latest with updated llama.cpp in the talk-llama example.
Build is failing on:
https://github.com/ggerganov/whisper.cpp/blob/master/examples/talk-llama/llama.cpp#L1116
`WHISPER_CUBLAS=…
-
**Is your feature request related to a problem? Please describe.**
I am building the prompt myself and calling
```
llm.create_completion(prompt, max_tokens=max_tokens,
…
-
## Goal
- llama3.1 should support Tool use in llama.cpp
- https://github.com/janhq/models/issues/16
## Original post
**Problem**
AFAICS, the current implementation does not have OpenAI Function Cal…