-
Create a struct `ggml_metal_locals` and populate using `GGML_TENSOR_LOCALS` similar to what we do in `ggml.c`:
https://github.com/ggerganov/llama.cpp/blob/3b4bab6a38502d9e68587c2c19f26472480ec4dd/g…
-
**Describe the bug**
when trying to run
python gen_ondevice_llama.py --hub-model-id mrmdw7lwq,mjqyx81rm,m1q8748pm,m0q92582m,m9m551gym,m7qk5wy7q,m1m60802m,mwn0k9zzq --output-dir ./export --tokenize…
-
The newest llama.cpp provides a new ggml encoding format, and no modern models support this version. Can you update the llama.cpp version? Thank you!
-
The current state of the testing framework is pretty bad - we have a few simple test tools in [tests](https://github.com/ggerganov/ggml/tree/master/tests), but these are not maintained properly and ar…
-
Just finished setting up an orange pi 5b with 16GB of ram and I get this error
Starting mining
/home/bill/Desktop/freedom-gpt/miner/mac/fgptminer: 1: Syntax error: word unexpected (expecting ")")
…
-
Most of GUI-s for llama.cpp are compiled in slow languages and their compiled binaries are huge in comparision to executables made by any C++ compiler.
-
## Goal
Cortex.cpp should have a super easy UX to on par with market alternatives
- User should have a 1-click installer, that prioritizes simple UX over size-complexity
- Installer packa…
-
When I run the docker container I see that the GPU is only being used for the embedding model (encoder), not the LLM.
I noticed that llama-cpp-python is not compiled properly (Notice: BLAS=0), as d…
-
I tried to play with mixtral Q5_K_M quant both on llama.cpp and python. Both servers use cuda 12, but same on 11.6. Here are some results:
llama cpp, A100
```
llama_print_timings: load tim…
-
### Describe what should be investigated or refactored
Fixes need to be made based on the output of the [Made for UDS "Silver" badge verification](https://github.com/defenseunicorns/uds-common/pull…