-
Hi
Im trying to do inference on a awq quantized model and im constantly getting this error when trying to generate text.
Im using Qwen2.5-72B-Instruct-AWQ.
Some code to give context:
sel…
-
### System Info
- Ubuntu 20.04
- NVIDIA A100
### Who can help?
@Tracin @kaiyux
### Information
- [X] The official example scripts
- [ ] My own modified scripts
### Tasks
- [ ] A…
-
### 🚀 The feature, motivation and pitch
Is the deepseek-v2 AWQ version supported now? When I run it, I get the following error:
```
[rank0]: File "/usr/local/lib/python3.9/dist-packages/vllm/mo…
-
### System Info / 系統信息
cuda 12.2,centos7
### Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
- [X] docker / docker
- [ ] pip install / 通过 pip install 安装
- [ ] installation from source …
-
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Cell In[2], [line 2](vscode-notebook-cell:?exe…
-
## System Info
Intel(R) Xeon(R) Platinum 8468
NVIDIA H800-80G
TensorRT-LLM version 0.12.0
## Who can help?
@Tracin @byshiue
## Reproduction
I followed the official procedure for LLama2 7b quantiza…
-
I deployed the BF16 and INT8 versions of Qwen2.5-coder-32b-instruct using vLLM (version 0.6.1) and evaluated them with OpenCompass. Surprisingly, BF16 underperformed compared to INT8 on several metric…
-
Is there a way to make it work with AWQ models?
Output:
```
Fetching 14 files: 100%|███████████████████████████| 14/14 [00:00
-
### Your current environment
vllm==0.6.1
### How would you like to use vllm
Steps to reproduce
This happens to Qwen2.5-32B-Instruct-AWQ
The problem can be reproduced with the following steps:
…
-
Hello Functionary team,
I'm trying to use the Functionary-small-v3.2 AWQ version with vLLM for inference, but I'm encountering an error. The vLLM library doesn't seem to recognize the 'FunctionaryF…