-
### Describe the bug
The output window where the llm outputs its text and displays the chat history randomly flickers intermittently.
### Is there an existing issue for this?
- [x] I have sea…
-
Hi! With the new version of Forge, and FLUX, this extension could be really practical for the millions of low VRAM laptops that can now run FLUX. The only problem is that it doesn't unload the LLM fro…
-
Now LLAMA 3.1 is out, but sadly it is not loadable with current text-generation-webui. I tried to update transformers lib which makes the model loadable, but I further get an error when trying to use …
-
After updating ipex-llm, running llama3.1 through langchain and ollama no longer works.
A simple reproducer:
```python
# pip install langchain langchain_community
from langchain_community.llms i…
-
**Describe the bug**
I fail to run on branch 'inference_streaming'.
**To Reproduce**
Steps to reproduce the behavior:
1. git checkout inference_streaming
2. python webui.py --port 50000 --model…
-
Occurs when using the /v1/chat/completions endpoint. Can't concatenate a list to a string using the + operator in Python, so this will halt any the generation requested. The issue is specifically the …
-
Hello, I am completly newbie, when it comes to the subject of llms
I install some ggml model to oogabooga webui And I try to use it. It works fine, but only for RAM. For VRAM only uses 0.5gb, and I d…
-
Will you add support for oobabooga's text-generation-webui? An llm initialization for post requests and a few patterns might be sufficient. I've been trying to do it, but I've had to try to figure out…
-
**Is your feature request related to a problem? Please describe.**
There is a lot of unnecessary complexity associated with the current way tools are handled by default. As of right now, OWUI does th…
-
### System Info / 系統信息
Driver Version: 535.171.04 CUDA Version: 12.2
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip insta…