llama-cpp Search Results

1000+ results
for llama-cpp

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/T-MAC #39

performance on mobile phone such as MTK D9000/D8300 or Qualc…

hi, can you share some performance data on MTK or Qualcomm chips? such as QWen or Gemma model's prefill and decode speed? thanks very much.

yuimo updated 3 weeks ago
2
gpustack/gpustack #165

Support passing CUDA version during installation

**Is your enhancement related to a problem? Please describe.** Currently, the installation process does not allow for specifying the CUDA version, the code is hardcoded to use the llama-box binary wi…

linyinli updated 1 month ago
1
abetlen/llama-cpp-python #982

[Question] why i have 40 seconds more for the loading of the…

Hello Guys, I'm wondering about performence, which is very strange on the same server, i ran the same model with query, and the loading time is totally differente between llama-cpp python and ll…

papipsycho updated 5 months ago
3
abetlen/llama-cpp-python #847

(mach-o file, but is an incompatible architecture (have 'x86…

# Prerequisites Please answer the following questions for yourself before submitting an issue. - [X ] I am running the latest code. Development is very rapid so there are no tagged versions as o…

alessandropaticchio updated 8 months ago
18
unslothai/unsloth #661

Quantization error-(model.save_pretrained_gguf("model", toke…

### **Error code:** RuntimeError Traceback (most recent call last) [](https://localhost:8080/#) in () ----> 1 model.save_pretrained_gguf("model", tokenizer,) 1 fra…

handsomechang114514 updated 3 months ago
2
ml-explore/mlx-examples #816

[Feature] Export Lora Adapters as GGML

lama.cpp dropped support for converting lora to ggml, it would be very useful if we could use adapters with llama.cpp instead of fusing or merging the fine tuned model.

rmarnold updated 3 months ago
3
ollama/ollama #4603

Import module faild: pip install -r llm/llama.cpp/requiremen…

### What is the issue? Archlinux, python3.12 ``` (ollama) ╭─hougelangley at Arch-Legion in ~/ollama on main✘✘✘ 24-05-24 - 11:00:23 ╰─(ollama) ⠠⠵ pip install -r llm/llama.cpp/requirements.txt Co…

HougeLangley updated 4 months ago
1
SJTU-IPADS/PowerInfer #140

Will using only CPU be faster than llama.cpp?

Will using only CPU be faster than llama.cpp?

liutt1312 updated 7 months ago
1
oobabooga/text-generation-webui #6398

llama-cpp inference - CUDA error

### Describe the bug Inference fails after prompt evaluation with llama-cpp backend with error: ``` CUDA error: invalid argument current device: 1, in function ggml_backend_cuda_graph_compute …

hpnyaggerman updated 1 week ago
1
abetlen/llama-cpp-python #1030

cuBLAS on WSL: "no CUDA-capable device is detected"

Hi there, I'm following this instruction to build llama.cpp from scratch: https://github.com/ggerganov/llama.cpp#cublas I run it in ubuntu in WSL. CPU inference works for me with no issue, but w…

Mikhail1988 updated 9 months ago
2

上一页 1...78 79 80 81 82 83 84...100 下一页

1000+ results for llama-cpp

1000+ results
for llama-cpp