-
### Your current environment
```text
The output of `python collect_env.py`
```
### How you are installing vllm
I install vLLM using Souce code.
```python
pip install -e .
```
but encounter…
-
### Your current environment
PyTorch version: 2.1.2+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (U…
-
### Your current environment
```text
Collecting environment information...
PyTorch version: 2.3.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
…
-
### Your current environment
Collecting environment information...
PyTorch version: 2.2.1+cu121
Is debug build: False
CUDA used to build PyTorch: 12.1
ROCM used to build PyTorch: N/A
OS: Ubunt…
-
### Your current environment
VLLM is 0.5.0,A100 , CUDA 12.1
### 🐛 Describe the bug
1、
CUDA_VISIBLE_DEVICES=1 python -m vllm.entrypoints.openai.api_server \
--model /home/Qwen1.5-1.8B-Chat \
…
-
# ENV
```
GPU: 2080Ti * 4(12G mem *4)
Mem:128G
CUDA: 12.2
Pytorch:2.1.0
Transformers:4.31.0
TensorRT:9.1.0.post12.dev4
TensorRT-LLM:0.5.0
Triton-trt-llm-backend:0.5.0
Triton:23.10
VLLM:0.2.…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### 🐛 Describe the bug
Recently, we have seen reports of `AsyncEngineDeadError`, including:
- [ ] #5060
…
-
### Your current environment
Referring to the issue #5181 "The Offline Inference Embedding Example Fails", the method LLM.encode() can only work for embedding models. Is there any idea to get the ou…
-
https://github.com/Alpha-VLLM/Lumina-T2X
This looks like a promising variation on Text to Anything. It'd be nice to get support for it as at the moment it's just gradio demos or python code.
-
Now that many newer Huggingface models come with a chat template in their tokenizer, FastChat should use it as the primary way to build conversations, falling back to `conversation.py` when a template…