-
## Issue encountered
Currently, inference of open models on my Mac device is quite slow since vllm does not support mps.
## Solution/Feature
Llama.cpp does support mps and would significantly spe…
-
**Describe the bug**
**To Reproduce**
Steps to reproduce the behavior:
excute this command:
CMAKE_ARGS="-DLLAMA_CUDA=on -DLLAMA_NATIVE=off" pip install 'instructlab[cuda]'
and compile err…
-
Will there be more support for running llama.cpp on Ryzen NPU chips?
-
First of all: CONGRATS ON YOUR AMAZING RESEARCH WORK.
Considering that this is using GGML and seems based directly on `llama.cpp`:
Why is this a separate project to `llama.cpp`, given that `llama.c…
-
We want to observe interactions of llama-cpp, try to get inspiration from https://github.com/cfahlgren1/observers/blob/main/src/observers/observers/models/openai.py
```python
from llama_cpp import…
-
**Goal**
- cortex.cpp's desktop focus means Drogon's features are unused
- We should contribute our vision and multimodal work upstream as a form of llama.cpp server
Can we consider refactoring llam…
-
This issue serves to track performance on Metal hardware versus MLX and llama.cpp.
-
Some of features like ggml_graph_plan function was removed from ggml library when combining llama.cpp branch, it seems not using the previous branch from ggml with whisper.cpp is fully supported.
I…
-
Could we have support for [Llama.cpp?](https://github.com/ggerganov/llama.cpp)
That will make the model more accessible to many popular tools like Ollama, LM Studio, Koboldcpp, text-generation-webui,…
-
OS: 22.04.1-Ubuntu
Python: Python 3.12.2
Build fails for llama-cpp-python
```
$ pip install -r requirements.txt
...
Building wheels for collected packages: llama-cpp-python
Building wheel…