-
I've added the stop token but it does not work?
-
I am requesting that you merge with the upstream flash-attention repo, in order to garner community engagement and improving integration and distribution.
This separation is a major blocker to AMD …
-
- [ ] [Qwen-1.5-8x7B : r/LocalLLaMA](https://www.reddit.com/r/LocalLLaMA/comments/1atw4ud/qwen158x7b/)
# TITLE: Qwen-1.5-8x7B : r/LocalLLaMA
**DESCRIPTION:** "Qwen-1.5-8x7B
New Model
Someone creat…
-
### Description
As I've documented in the discussion on the subject, it is actually relatively easy to redirect the openAI requests to a locally hosted solution (I've only tested with text-generati…
-
This seems like it'll be the most important task to make this more viable for people.
Alternative models will be cheaper, potentially much faster, allow running on someone's own hardware (LLaMa), a…
-
After running ` .\venv\Scripts\activate` followed by `.\run.bat`
I encounter a looping error:
Checking installs and venv + autodebug checks
Python version: 3.10.9 (tags/v3.10.9:1dd9be6, Dec 6 2…
-
Hi!
Thank you for the paper! It is inspiring that you can compress weights to about 1 bit and the model still works better than random.
A practical sub-2-bit quantization algorithm would be a grea…
-
### Describe the bug
This is a reproduction of #4193
This was never adequately fixed, or has regressed, it appears.
### Is there an existing issue for this?
- [X] I have searched the existin…
-
Hi all !
model is working great ! i am trying to use my 8GB 4060TI with MODEL_ID = "TheBloke/vicuna-7B-v1.5-GPTQ"
MODEL_BASENAME = "model.safetensors"
I changed the GPU today, the previous one wa…
-
**Have you searched for similar [requests](https://github.com/SillyTavern/SillyTavern/issues?q=)?**
Yes
**Is your feature request related to a problem? Please describe.**
Trying to migrate from O…
Xabab updated
2 weeks ago