-
### Describe the bug
13600KF+32gb+4080 16GB runs a SMALL 7B model (has enough GPU RAM) it just only output "□□□□□□□□",change a lot model,but still the same
### Is there an existing issue for this?…
-
See this prompt for reference:
https://github.com/BudEcosystem/code-millenials/blob/main/utils/prompt.py
I also double-checked asking in their HF repo.
However, this model will always produce s…
-
OS: Ubuntu 22.04
CUDA: 12.3
GPUs: 2x4090 (2x24GB)
Release: `exllamav2-0.0.11+cu121-cp310-cp310-linux_x86_64.whl`
DeepSeek coder produces a blank completion:
```sh
$ python -u examples/chat.py …
-
Copilot chat just got GPT-4 inside of it which is a great thing that I love. It does use GPT-3.5 for some things though when it really shouldn't. It makes the code worse. Copilot should just use GPT-4…
-
example:
- https://huggingface.co/TheBloke/CodeLlama-7B-AWQ: physical size is 4GB but use VRAM about 20GB
- https://huggingface.co/TheBloke/deepseek-coder-33B-instruct-AWQ: physical size is 17GB but…
-
# Prerequisites
Please answer the following questions for yourself before submitting an issue.
# Expected Behavior
Running the latest llama.cpp with [DeepSeek 33B GGUF](https://huggingface.co…
-
**Describe the bug**
Once Twinny encounters something it does not understand, it responds with "Sorry, I don't understand. Please try again", for all subsequent prompts. Also, the "fill in the middle…
nav9 updated
6 months ago
-
**Describe the bug**
**Steps to reproduce**
Steps to reproduce the behavior:
1. Download the model Wizard Coder Python 13B Q5 in the Model Hub
2. Start the model and start the conversation
…
-
When asked a strictly math question it does fine. However when asked "what is your knowledge" the answer is
The answer is: Good.
The answer is: Good.
].join(',')
].join(','.split(…
-
I attempted to run a number of models through two GPU and got OOM however this was successful with single GPU. Initially I thought my model was too large when running a 34B version and scaled down to …