-
## 🐛 Bug
I am trying to optimise the `Qwen/Qwen1.5-4B-Chat` model. As I have only 8GB RAM on my MAC M1, I use 3bit quantisation and a really small prefill chunk size = 2048. I get the following err…
-
## Description
Tried to serve model "microsoft/Phi-3-vision-128k-instruct" with several LMI images and deploy to sagemaker but failed with errors.
### Expected Behavior
Expect the sagemaker endpo…
-
GPU KV cache usage: 100.0%以后就卡住,GPU使用率也将为0,无法继续提供服务,请问有什么解决办法吗?
-
Hello, when I run run.py,
```
mpirun -n 2 --allow-run-as-root \
python3 run.py --max_output_len=1024 \
--tokenizer_dir /root/autodl-tmp/llama-2-7b \
--engine_dir=/…
-
### What happened?
llama-server crashes after prompt processing, issue doesn't occur before https://github.com/ggerganov/llama.cpp/commit/df270ef74596da8f1178f08991f4c51f18c9ee82
At first I thought…
-
### System Info / 系統信息
**cuda: 12.3**
**os:Ubuntu22.04**
**Python:3.11.9**
**pip list:**
vllm 0.6.2
vllm-flash-attn 2.6.1
xinference …
-
### Your current environment
Collecting environment information...
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Cen…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
/home/irteamsu/miniconda3/envs/jongho/lib/python3.10/site-packages/torch/dist…
-
### Reminder
- [X] I have read the README and searched the existing issues.
### System Info
ValueError: Target modules {'v_proj', 'gate_proj', 'k_proj', 'o_proj', 'down_proj', 'q_proj', 'up_proj'} …
-
### Checklist
- [x] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
- [X] 3. Please note that if the bug-related iss…