-
## Motivation #
There is significant interest in vLLM supporting encoder/decoder models. Issues #187 and #180 , for example, request encoder/decoder model support. As a result encoder/decoder supp…
-
### Your current environment
The output of `python collect_env.py`
```text
root@newllm201:/workspace# vim collect.py
root@newllm201:/workspace# python3 collect.py
Collecting environment info…
-
### 🐛 Describe the bug
Flex attention with dynamic shapes stumbles upon comparing Relational expressions. I found two places of this error.
One in `flex_decoding.py`:
```
File "/usr/local/li…
-
### Your current environment
vllm==0.6.1
### Model Input Dumps
when i use medusa train, medusa0,medusa1,medusa2 acc has 0.95, train result is ok,
but i try vllm to delpoy medusa, deploy is…
-
Is the ExLlamaV2DynamicGeneratorAsync not working with speculative decoding? Hope that it is something i did wrongly instead cause i really want to use it
```python
import sys, os
# sys.path.appe…
-
### Your current environment
```
PyTorch version: 2.3.0+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC versio…
-
### Your current environment
why is it important:
This is a prerequisite to the work on enabling troch.compile on vllm, we need to be able to build vllm with nightly so that we can iterate on chan…
-
### Describe the issue
Hi,
Having an issue trying to run Whisper on a A770 with XPU selected as the device with the following environment that works with CPU set as the device. Any insight is ap…
-
### Your current environment
The output of `python collect_env.py`
```text
Collecting environment information...
WARNING 09-23 09:07:16 _custom_ops.py:18] Failed to import from vllm._C with …
-
### Your current environment
```text
The output of `python collect_env.py`
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12…