issues
search
ggerganov
/
llama.cpp
LLM inference in C/C++
MIT License
60.82k
stars
8.68k
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
make : fix missing -O3
#8143
slaren
closed
4 minutes ago
0
ci : publish new docker images only when the files change
#8142
slaren
opened
57 minutes ago
0
Inference support for T5 and FLAN-T5 model families
#8141
fairydreaming
opened
59 minutes ago
0
ggml : add GGML_CUDA_USE_GRAPHS option, restore GGML_CUDA_FORCE_CUBLAS
#8140
slaren
opened
1 hour ago
0
devops : remove clblast + LLAMA_CUDA -> GGML_CUDA
#8139
ggerganov
closed
1 hour ago
0
Bug: infill reference crashed
#8138
kidoln
opened
4 hours ago
0
Control vector loading fixes
#8137
jukofyork
opened
4 hours ago
4
Performance Tuning for Q4_K matmul CUDA kernel
#8136
contentis
opened
4 hours ago
4
Added support for Viking pre-tokenizer
#8135
kustaaya
opened
5 hours ago
0
Bug: converting model from HF to GGUF gives error
#8134
thesyntaxinator
opened
8 hours ago
0
`json`: unified properties order across optional & required
#8133
ochafik
opened
9 hours ago
0
`json`: update grammars/README w/ examples & note about additionalProperties
#8132
ochafik
opened
9 hours ago
1
Quantize: use --pure, --output-tensor-type and --token-embedding-type as the same time
#8130
ZeusXuan
opened
10 hours ago
1
Quantize: use --pure, --output-tensor-type and --token-embedding-type as the same time
#8129
ZeusXuan
closed
10 hours ago
0
Bug: After running for a while, the llama-server exhibits extremely high CPU usage, resulting in timeouts for all requests.
#8128
moqimoqidea
opened
12 hours ago
0
Bug: Missing required key: general.description
#8127
perp
opened
13 hours ago
0
Bug: llama3 8b gradient unsupported?
#8124
0wwafa
opened
17 hours ago
1
CUDA: fix misaligned shared memory read
#8123
JohannesGaessler
closed
11 hours ago
0
move public backend headers to the public include directory
#8122
slaren
closed
11 hours ago
0
Embed files
#8121
katsu560
opened
22 hours ago
0
Bug: convert-hf-to-gguf.py - AttributeError: 'LlamaTokenizerFast' object has no attribute 'added_tokens_decoder'
#8120
abgulati
closed
20 hours ago
1
Vulkan CMake integration
#8119
bandoti
opened
22 hours ago
3
Add `JAIS` model(s)
#8118
fmz
opened
23 hours ago
1
Bug: Crash with GGML CUDA error when inferencing on llama-server
#8117
DerekJuba-NIST
closed
11 hours ago
9
llama : NvAPI performance state change support
#8116
sasha0552
opened
1 day ago
0
Clarify default MMQ for CUDA and LLAMA_CUDA_FORCE_MMQ flag
#8115
isaac-mcfadyen
closed
11 hours ago
0
Feature Request: It would be convenient and faster if users could specify that the model data used for a RPC-server instance is already available by some fast(er) means (file system GGUF, whatever).
#8114
ghchris2021
opened
1 day ago
1
Feature Request: Provide means to quantify the restriction of RAM/VRAM usage for each GPU and system RAM.
#8113
ghchris2021
opened
1 day ago
0
Bug: [RPC] RPC apparently isn't honoring backend memory capacity et. al.
#8112
ghchris2021
opened
1 day ago
3
disable docker CI on pull requests
#8110
slaren
closed
1 day ago
0
Bug: abort on Android (pixel 8 pro)
#8109
nivibilla
opened
1 day ago
1
sh: 1: ./llama.cpp/llama-quantize: not found
#8107
RakshitAralimatti
closed
1 day ago
2
[SYCL] Fix the sub group size of Intel
#8106
luoyu-intel
opened
1 day ago
0
clip : suppress unused variable warnings
#8105
danbev
opened
1 day ago
0
Update control vector help
#8104
HatsuneMikuUwU33
closed
1 day ago
0
Extend llm_build_ffn() to support _scale tensors
#8103
Eddie-Wang1120
closed
11 hours ago
0
CUDA: fix matrix multiplication algorithm choice
#8102
JohannesGaessler
closed
1 day ago
0
Bug: ggml_backend_cpu_buffer_type_alloc_buffer: failed to allocate buffer of size 137438953504
#8101
idekel
closed
1 day ago
2
CUDA: fix MMQ writeback for int8 tensor cores
#8100
JohannesGaessler
closed
1 day ago
0
Add Support for Bamboo LLM
#8099
ffroquemartinez
opened
2 days ago
0
Bug: llama.cpp apparently exits with '[end of text]' before processing prompt if prompt is ~2048 tokens
#8098
hnfong
opened
2 days ago
0
Bug: Crashes at the end of startup during first prompt processing
#8096
takosalad
opened
2 days ago
23
[SYCL] Re-enabled mul_mat_batched_sycl
#8095
airMeng
closed
1 day ago
0
Bug: Cannot load GGUF file, it asks if it is GGML.
#8094
takosalad
closed
2 days ago
1
llama : return nullptr from llama_grammar_init
#8093
danbev
closed
23 hours ago
5
Vulkan backend regression: gibberish output when layers offloaded to GPU
#8092
Adriankhl
opened
2 days ago
2
Fix tensor groups for encoder-decoder models in gguf-dump.py
#8090
fairydreaming
closed
2 days ago
0
Add Unigram tokenizer needed by T5 and FLAN-T5 model families
#8089
fairydreaming
closed
23 hours ago
0
Streamline embeddings from "non-embedding" models
#8087
iamlemec
opened
2 days ago
0
[feature request] conversion to gguf in a more pure form.
#8086
0wwafa
opened
2 days ago
2
Next