issues
search
abetlen
/
llama-cpp-python
Python bindings for llama.cpp
https://llama-cpp-python.readthedocs.io
MIT License
7.8k
stars
934
forks
source link
issues
Newest
Newest
Most commented
Recently updated
Oldest
Least commented
Least recently updated
Speculative decoding gives weird results in v. 0.3
#1770
mobeetle
opened
8 hours ago
0
Error when passing model to deepcopy in llama_cpp_python==0.3.0
#1769
sergey21000
opened
10 hours ago
0
[FEAT]: TLS Certificate Support
#1768
isgallagher
opened
23 hours ago
0
Inference Speed is Extremely Slow for 72B Model with Long Contexts
#1767
wrench1997
opened
1 day ago
0
error: no matching function for call to 'ggml_vk_dispatch_pipeline'
#1765
yurivict
closed
2 days ago
0
FileNotFoundError: Shared library with base name 'llama' not found
#1764
HAOYON-666
opened
2 days ago
1
Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm
#1763
lsorber
opened
3 days ago
0
`Llama.embed` crashes when `n_batch` > 512
#1762
lsorber
opened
3 days ago
3
Expose libggml in internal APIs
#1761
abetlen
closed
2 days ago
0
Cannot load moondream2 model in colab
#1760
phuc2272000
opened
3 days ago
0
Server crash with exceed context | lib version >= v0.2.81
#1759
carlostomazin
opened
3 days ago
0
fix: handle multiple calls to the same tool
#1758
jeffmaury
opened
4 days ago
0
Do llama.cpp support input_embeds?
#1757
OswaldoBornemann
opened
4 days ago
0
chatml-function-calling chat format fails to generate multi calls to the same tool
#1756
jeffmaury
opened
5 days ago
1
Serverless inferencing, basic chatbot style
#1755
ericcurtin
opened
5 days ago
0
Change the command to `CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off" pip install -U llama-cpp-python --force-reinstall --no-cache-dir` solved the problem.
#1754
yimuu
opened
6 days ago
1
Define Custom Shared Library Path
#1753
jetlime
opened
6 days ago
0
"/data/text-generation-webui/llama-cpp-python/ggml.h": No such file or directory.
#1752
thistleknot
closed
1 week ago
1
chore(deps): bump actions/cache from 3 to 4
#1751
dependabot[bot]
closed
1 week ago
0
Update README.md
#1750
Smartappli
closed
1 week ago
0
CMake Error: CMAKE_CXX_COMPILER not set, after EnableLanguage
#1749
1431551850
opened
1 week ago
0
serve cannot use minicpmv-2.6
#1748
PredyDaddy
closed
1 week ago
0
error when I using the Function Call
#1747
PredyDaddy
opened
1 week ago
0
model.close() Fails to Release Memory from ChatHandler Projector in Multimodal Models
#1746
cesarandreslopez
closed
1 week ago
1
How get output from fine tuned llama3 model(trained with alpaca format dataset) in a json format ?
#1744
ApurvPujari
opened
1 week ago
0
chore(deps): bump pypa/cibuildwheel from 2.20.0 to 2.21.1
#1743
dependabot[bot]
closed
1 week ago
0
Update sampling API for llama.cpp
#1742
abetlen
closed
1 week ago
2
chore(deps): bump pypa/cibuildwheel from 2.20.0 to 2.21.0
#1741
dependabot[bot]
closed
1 week ago
1
fatal error: intrin.h: No such file or directory
#1740
triamozavr
opened
2 weeks ago
0
corrected command
#1739
Shehrozkashif
opened
2 weeks ago
0
corrected their must be 1 intead of on
#1738
Shehrozkashif
closed
2 weeks ago
0
LLamaDiskCache: needs a RO / 'static' disk cache for RAG use cases
#1737
tc-wolf
opened
2 weeks ago
0
phi3 chat format
#1736
SimJeg
opened
2 weeks ago
0
[Draft Issue] system crash on exit (after inference is done)
#1735
Mrw33554432
closed
3 days ago
1
How to display a chat prompt after create_chat_completion
#1733
dtischencko
opened
2 weeks ago
0
Scores are stored in a 32-bit NumPy array even when K and V are quantized
#1732
EthanZoneCoding
opened
3 weeks ago
0
show how to run inference using minicpm v2.6
#1731
thistleknot
opened
3 weeks ago
1
flux1-dev-Q8_0.gguf
#1727
ayttop
opened
3 weeks ago
1
Combining grammars+multimodal models
#1726
joris-sense
closed
3 weeks ago
1
How do I customize the Chat format?
#1724
lingyezhixing
opened
3 weeks ago
0
Why don't use gpu
#1723
suwenzhuo
opened
3 weeks ago
1
Resync llama_grammar with llama.cpp implementation and use curly braces quantities instead of repetitions
#1721
gbloisi-openaire
opened
4 weeks ago
0
GGML_CUDA_ENABLE_UNIFIED_MEMORY=1 behavior is strange.
#1720
Enchante503
opened
4 weeks ago
0
No matter how many times I build it, it won't start
#1719
Enchante503
opened
4 weeks ago
1
Remove unnecessary pyproject optional dependency
#1718
LecrisUT
opened
4 weeks ago
0
How to use this model?
#1717
dzy1128
opened
4 weeks ago
2
feat: adding support for external chat format contribution
#1716
axel7083
opened
1 month ago
0
Allow python packages to contribute to LlamaChatCompletionHandlerRegistry
#1715
axel7083
opened
1 month ago
1
Windows Build Stuck at "Building wheel for llama-cpp-python (pyproject.toml) ... Generating Code..."
#1714
Orenji-Tangerine
closed
1 month ago
6
flash attention on Nvidia Tesla P100s results in the `CUDA error: unspecified launch failure` - (`CUDA kernel flash_attn_tile_ext_f16 has no device code compatible with CUDA arch 520`)
#1710
AlHering
opened
1 month ago
2
Next