-
Output generated in 2.47 seconds (3.25 tokens/s, 8 tokens, context 28, seed 295305407)
Traceback (most recent call last):
File "D:\oobabooga_windows\installer_files\env\lib\site-packages\gradio\ro…
-
Hi, i noticed that in the config of sft-8-datasets, 5% red_pajama are added in sft training.
So there are 3 question i was confused:
1. Will the pretrain data size be more larger and the instructio…
-
### Describe the bug
A clear and concise description of what the bug is.
Although the model was able to load, it did not respond for a long time. I can't use "xinference list" to list any running mo…
-
Firstly, thanks for all the great new commits over the last few days. AutoGPTQ is looking better and better every day!
I have been doing some testing on AutoGPTQ and am really pleased to see it is …
-
### Describe the bug
not sure if this is fixable in your code, but here it is:
python server.py --verbose --model-menu --trust-remote-code --load-in-8bit
INFO:Gradio HTTP request redirected t…
-
### Describe the bug
Unable to load the model because it can't determine model type.
### Is there an existing issue for this?
- [X] I have searched the existing issues
### Reproduction
Attempt to…
-
H!
I would like to understand why the vram is not being released after a request completion? So something I noticed is that when I send queries the vram gets filled but after the answer is received…
-
### Describe the bug
I searched in issues and in GitHub pages but couldn't solve this issue.
Trying to load a Llama GPTQ model.
In parameters, selected: 4 Wbits, 128 groupsize and Llama model type
…
-
Awesome work on the 0.2.0 release and the wheels, PanQiWei! Thousands of new people are trying AutoGPTQ today and that is amazing.
Got an issue that's affecting some of them:
**Describe the bug…
-
Hey folks, thanks for making this library! I'm looking forward to using it in my own code. When I try to use apple metal, inference moves quickly once the model is loaded, but the load times are very …