-
# Prerequisites
When I install via pip install llama-cpp-python, there will be an error. It will occur on versions 0.2.81 and 0.2.80, The version 0.2.79 can be successfully installed.
python 3.11…
-
# Expected Behavior
Embedding text with a long-context model like BGE-M3 [1] should be able to output token embeddings for more than 512 tokens (this is of interest for 'late interaction' retrieval…
-
Hi there,
I wanted to try out your GUI with a Docker build (cuda backend).
I ran into a couple of issues while building the image:
1. could not install pyenv and python due to an issue with the e…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [yes] I am running the latest code. Development is very rapid so there are no tagged versions as …
-
Following instructions at https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md
I tried a bunch of different models and they all fail in `run.py` on:
```
Traceback (most recent cal…
-
# Expected Behavior
I train a model with the [transformers](https://github.com/huggingface/transformers) lib, then convert it to llama.cpp format using `convert.py` from [llama.cpp](https://github.…
-
### Describe the bug
After downloading a model I try to load it but I get this message on the console:
Exception: Cannot import 'llama-cpp-cuda' because 'llama-cpp' is already imported. Switching to…
-
### What happened?
This started as a problem with Ooba, but I'm seeing the same issue with KoboldCPP and llama.cpp. I updated Ooba the other day, after maybe a week or two of not doing so. While it …
-
**Is your feature request related to a problem? Please describe.**
Today, the [llama-cpp/llama_chat_format.py] contains 25 chat format, and 4 chat_completion_handler, this currently force the diffe…
-
### Describe the issue as clearly as possible:
`examples/llamcpp_example.py` is broken
It seems like the model is producing some garbage output (which shouldnt be allowed by the logit processor). …