-
Below is the error i am getting while loading TheBloke/llama-2-70b-chat-AWQ model:
OutOfMemoryError: CUDA out of memory. Tried to allocate 112.00 MiB (GPU 0; 22.20 GiB total capacity; 21.30 GiB alrea…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
- [X] I am running the latest code. Development is very rapid so there are no tagged versions as of…
-
I get the following error when I run `docker compose up --build` on macos.
I've already tried installed build-essentials.
```
langflow % docker compose up --build
[+] Building 51.1s (27/27) …
-
### Describe the bug
I have built a Docker image myself and deployed xinference on k8s. The homepage can be accessed normally. However, loading the model failed, and the error message is 'not found'.…
-
Hi, just wondering with the new open sources coming out like llama2 or any other at that level i hugginface why are we still using API from openai???... If i can operate at the same level as GT4 using…
-
I'm getting errors with starcoder models when I try to include any non-trivial amount of tokens. I'm getting this with both my raw model (direct .bin) and quantized model regardless of version (pre Q4…
-
### Describe the bug
Can't load GPTQ model with ExLlamav2_HF and ExLlamav2. I have tried these two models:
- TheBloke_upstage-llama-30b-instruct-2048-GPTQ_gptq-4bit-128g-actorder_True
- TheBloke_Op…
-
### 是否已有关于该错误的issue? | Is there an existing issue for this?
- [X] 我已经搜索过已有的issues | I have searched the existing issues
### 当前行为 | Current Behavior
使用示例代码
``` python
from transformers import Au…
-
### Describe the bug
Load this model. No matter what settings I set (such as gpu layers), model runs entirely on cpu
https://huggingface.co/dhairya0907/meta-llama-2-7b-chat-hf-gguf-v1
I did tr…
-
Quite similar to #1127 , although this issue is triggered in a different context by a different rule, so probably worth a different issue
**Describe the bug**
`cabal_package` generates `.so` fil…