-
Jenkins and Asciidoctor integration is not straightforward and always caused issues on every jenkins or plugin update. When we updated Cukedoctor (the library used to generate documentation from cucum…
-
### Your current environment
I want to deply neuralmagic/DeepSeek-Coder-V2-Instruct-FP8 with 8 x NVIDIA L20,
use -tensor-parallel-size=8 --enforce-eager --trust-remote-code --quantization=fp8 --kv…
-
### 🚀 The feature, motivation and pitch
The GenAI perf toolkit from NVIDIA can be used as an alternative benchmark tools for vLLM. While we already have benchmark scripts and framework in `benchmarks…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### Model Input Dumps
_No response_
### 🐛 Describe the bug
…
-
### Your current environment
```text
The output of `python collect_env.py`
```
python3.10.12
### How would you like to use vllm
I want to run inference of a [specific model](put link here). I do…
TZJ12 updated
2 weeks ago
-
**Description:**
- [gitbook draft content](https://app.gitbook.com/o/Yd4Wv0Fi89kSKpmakWrn/s/uAlDVLhlFIbJw6huUS2x/release-notes-workflow/dec-05-ish)
- release note
- new error report
- are on…
-
### Your current environment
vLLM version: v0.6.3.post1
### 🐛 Describe the bug
In the latest version v0.6.3.post1, when generating long texts (for example, when the number of tokens reaches 2…
-
### 🚀 The feature, motivation and pitch
I would like to serve smaller models (e.g facebook/opt-125m) using VLLM on TPU. I can't do this currently because the Pallas backend has the limitation `NotImp…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
Which branch should I use to test speculative decoding, and which branch curren…
-
### What happened?
I am currently use living documentation with cucumber by adding readme file under feature level but I am trying same approach with Junit and I don't have success question is this…