-
### Your current environment
The output of `python collect_env.py`
```text
PyTorch version: 2.4.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A…
-
### Your current environment
```text
--2024-08-07 03:22:15-- https://raw.githubusercontent.com/vllm-project/vllm/main/collect_env.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)…
-
### Your current environment
This bug is irrelevant to environment.
### 🐛 Describe the bug
Thanks for open source such a excellent project!
I found it possibly misses an "else" in the asyn…
-
### 🚀 The feature, motivation and pitch
Currently, vllm with Speculative Decoding requires that the draft model and target model have the same vocab size. However, the target model may have a large…
-
### Anything you want to discuss about vllm.
I've fine-tuned Qwen2.5-14B-Instruct using QLora(bitsandbytes 4bit) and also a full fine-tune. However when I tried to use it with a quantized model (Qw…
-
### Your current environment
When I set `VLLM_TENSOR_PARALLEL_SIZE = 2`, it works well. But when I change it to 4, vllm can not support Phi3-medium-*.
```
torch=2.3.0
vllm=0.5.0.post1
transform…
-
### Your current environment
```
#!/bin/bash
# Copyright (C) 2024 Intel Corporation
# SPDX-License-Identifier: Apache-2.0
# Set default values
default_port=8008
default_model=$LLM_MODEL
defa…
-
### Your current environment
vllm==0.5.4
GPU: L20, Memory 46GB
```text
Package Version
--------------------------------- ------------
aiohappyeyeballs …
-
running an intel chip Intel Celeron N3150 post broadwell (brasswell released in 2015) and i get this with the intel-media-driver on arch
any suggestions on how to make it work ?
thanks in advance
-
### Your current environment
The output of `python collect_env.py`
```text
WARNING 10-15 15:24:09 cuda.py:22] You are using a deprecated `pynvml` package. Please install `nvidia-ml-py` instead,…