-
### Reminder
- [X] I have read the README and searched the existing issues.
### Reproduction
如题
### Expected behavior
_No response_
### System Info
_No response_
### Others
_No response_
-
The inference speed of naive model parallel is much better than tensor parallel:
Setup: Llama-30b on 2080Ti 22G x4
Naive: 31.64s
4-way TP, main branch: 177.78s
4-way TP, llama branch: 102.22s
…
-
When I was running "pip install -r assets/requirements/requirements.txt", I got the following errors. Could you please tell me what should I do?
Building wheels for collected packages: mpi4py
Bu…
-
### Describe the bug
after cloning the repository docker is not able to create a docker container because of the missing configuration file.
### Is there an existing issue for this?
- [X] I have se…
-
**Description**
I want to be able to increase the number of threads_batch to 48 in the ui.
**Additional Context**
If applicable, please provide any extra information, external links, or scree…
-
Please help. I've been getting gibberish responses with exllama 2_hf. I saw this post: https://github.com/oobabooga/text-generation-webui/pull/2912
But I'm a newbie, and I have no idea what half 2 …
-
我试了很多参数组合都加载不成功。
-
### Describe the bug
It seems it forces sampling the first token before the context has finished processing or something along those lines. Not sure if it applies to the regular llama.cpp backend o…
-
Describe the bug:
Error for memgpt talking to local LLM.
Please describe your setup:
What is the output of memgpt version? (eg "0.2.4")
0.3.7
How did you install memgpt?
pip install pymemgpt…
-
**Description**
About 10 days ago, KoboldCpp added a feature called Context Shifting which is supposed to greatly reduce reprocessing. Here is their official description of the feature:
> NEW FE…