-
### Describe the bug
https://huggingface.co/google/gemma-2-9b-it
I downloaded the full version of gemma-2-9b-it from huggingface and copied the files to models folder.
Because the version of Tr…
-
# Bug Report
## Description
**Bug Summary:**
The new version: 0.3.8 the website shows the error: "Internal Server Error"
And absolutely no logs in the terminal.
I'm using the docker-compose:
…
-
### What is the issue?
Until v0.1.38 version my setup has 9 token/s but since the v0.1.39 to the actual v0.1.48 and v0.2 version it reduce the performance to 0.12 token/s
My setup:
- Intel(R) Cor…
-
### System Info / 系統信息
cuda 11.8
python3.9
xinference 0.14.0
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] in…
-
accelerate: 0.27.* (upstream) VS 0.21.0 (ipex-llm)
tokenizers: 0.15.2 (by default upstream will install the latest) VS 0.13.3 (ipex-llm)
transformers: 4.38.* (upstream) VS 4.31.0 (ipex-llm)
-
### What is the issue?
Whenever i try to chat with the llm through open-webui and ollama, i get this in the logs of ollama:
`ERROR [validate_model_chat_template] The chat template comes with this mo…
Joly0 updated
5 months ago
-
**Description**
I have created AutoAWQ as a package to more easily quantize and run inference for AWQ models. I wish to have AutoAWQ integrated into text-generation-webui to make it easier for peop…
-
**Is your feature request related to a problem? Please describe.**
Currently, there isn't any way to set limits on the number of requests that can be made to the large language models (LLMs) within a…
-
In attached, please find output of webui and ollama server console. At line 1 of webui output, I ask the question, using llama3:latest (line 3). Result is shown in lines 4-42
At line 45, I ask sam…
-
### What is the issue?
Whenever i try to give the second prompt on any GGUF models ollama fails here is the logs
time=2024-07-12T15:47:23.505Z level=INFO source=sched.go:738 msg="new model will…