ggerganov llama.cpp issues

ggerganov / llama.cpp

LLM inference in C/C++

MIT License

60.82k stars 8.68k forks source link

issues

Newest

Newest Most commented Recently updated Oldest Least commented Least recently updated

make : fix missing -O3

#8143 slaren closed 4 minutes ago
0
ci : publish new docker images only when the files change

#8142 slaren opened 57 minutes ago
0
Inference support for T5 and FLAN-T5 model families

#8141 fairydreaming opened 59 minutes ago
0
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS

#8140 slaren opened 1 hour ago
0
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA

#8139 ggerganov closed 1 hour ago
0
Bug: infill reference crashed

#8138 kidoln opened 4 hours ago
0
Control vector loading fixes

#8137 jukofyork opened 4 hours ago
4
Performance Tuning for Q4_K matmul CUDA kernel

#8136 contentis opened 4 hours ago
4
Added support for Viking pre-tokenizer

#8135 kustaaya opened 5 hours ago
0
Bug: converting model from HF to GGUF gives error

#8134 thesyntaxinator opened 8 hours ago
0
`json`: unified properties order across optional & required

#8133 ochafik opened 9 hours ago
0
`json`: update grammars/README w/ examples & note about additionalProperties

#8132 ochafik opened 9 hours ago
1
Quantize: use --pure, --output-tensor-type and --token-embedding-type as the same time

#8130 ZeusXuan opened 10 hours ago
1
Quantize: use --pure, --output-tensor-type and --token-embedding-type as the same time

#8129 ZeusXuan closed 10 hours ago
0
Bug: After running for a while, the llama-server exhibits extremely high CPU usage, resulting in timeouts for all requests.

#8128 moqimoqidea opened 12 hours ago
0
Bug: Missing required key: general.description

#8127 perp opened 13 hours ago
0
Bug: llama3 8b gradient unsupported?

#8124 0wwafa opened 17 hours ago
1
CUDA: fix misaligned shared memory read

#8123 JohannesGaessler closed 11 hours ago
0
move public backend headers to the public include directory

#8122 slaren closed 11 hours ago
0
Embed files

#8121 katsu560 opened 22 hours ago
0
Bug: convert-hf-to-gguf.py - AttributeError: 'LlamaTokenizerFast' object has no attribute 'added_tokens_decoder'

#8120 abgulati closed 20 hours ago
1
Vulkan CMake integration

#8119 bandoti opened 22 hours ago
3
Add `JAIS` model(s)

#8118 fmz opened 23 hours ago
1
Bug: Crash with GGML CUDA error when inferencing on llama-server

#8117 DerekJuba-NIST closed 11 hours ago
9
llama : NvAPI performance state change support

#8116 sasha0552 opened 1 day ago
0
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag

#8115 isaac-mcfadyen closed 11 hours ago
0
Feature Request: It would be convenient and faster if users could specify that the model data used for a RPC-server instance is already available by some fast(er) means (file system GGUF, whatever).

#8114 ghchris2021 opened 1 day ago
1
Feature Request: Provide means to quantify the restriction of RAM/VRAM usage for each GPU and system RAM.

#8113 ghchris2021 opened 1 day ago
0
Bug: [RPC] RPC apparently isn't honoring backend memory capacity et. al.

#8112 ghchris2021 opened 1 day ago
3
disable docker CI on pull requests

#8110 slaren closed 1 day ago
0
Bug: abort on Android (pixel 8 pro)

#8109 nivibilla opened 1 day ago
1
sh: 1: ./llama.cpp/llama-quantize: not found

#8107 RakshitAralimatti closed 1 day ago
2
[SYCL] Fix the sub group size of Intel

#8106 luoyu-intel opened 1 day ago
0
clip : suppress unused variable warnings

#8105 danbev opened 1 day ago
0
Update control vector help

#8104 HatsuneMikuUwU33 closed 1 day ago
0
Extend llm_build_ffn() to support _scale tensors

#8103 Eddie-Wang1120 closed 11 hours ago
0
CUDA: fix matrix multiplication algorithm choice

#8102 JohannesGaessler closed 1 day ago
0
Bug: ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 137438953504

#8101 idekel closed 1 day ago
2
CUDA: fix MMQ writeback for int8 tensor cores

#8100 JohannesGaessler closed 1 day ago
0
Add Support for Bamboo LLM

#8099 ffroquemartinez opened 2 days ago
0
Bug: llama.cpp apparently exits with '[end of text]' before processing prompt if prompt is ~2048 tokens

#8098 hnfong opened 2 days ago
0
Bug: Crashes at the end of startup during first prompt processing

#8096 takosalad opened 2 days ago
23
[SYCL] Re-enabled mul_mat_batched_sycl

#8095 airMeng closed 1 day ago
0
Bug: Cannot load GGUF file, it asks if it is GGML.

#8094 takosalad closed 2 days ago
1
llama : return nullptr from llama_grammar_init

#8093 danbev closed 23 hours ago
5
Vulkan backend regression: gibberish output when layers offloaded to GPU

#8092 Adriankhl opened 2 days ago
2
Fix tensor groups for encoder-decoder models in gguf-dump.py

#8090 fairydreaming closed 2 days ago
0
Add Unigram tokenizer needed by T5 and FLAN-T5 model families

#8089 fairydreaming closed 23 hours ago
0
Streamline embeddings from "non-embedding" models

#8087 iamlemec opened 2 days ago
0
[feature request] conversion to gguf in a more pure form.

#8086 0wwafa opened 2 days ago
2