-
## 问题描述
使用第2第3块gpu启动时,卡着不动(而使用1 2、1 3的两两组合则没有问题)
cuda版本:12.1.0
Driver版本: 535.54.03
torch: 2.1.2
fschat: 0.2.34
vllm: 0.2.6
ray: 2.8.1
## 启动命令
```shell
CUDA_VISIBLE_DEVICES="2,3" python -…
-
Nvidia jetson is aarch64 , in ubuntu20.04 server(cuda 12.2),
when run "pip install vllm " , some error happened :
× Getting requirements to build wheel did not run successfully.
│ exit code: …
-
### Motivation.
I am one of the authors of the paper Stay On Topic with Classifier-Free Guidance ( https://openreview.net/forum?id=RiM3cl9MdK¬eId=s1BXLL1YZD ) who has been nominated as ICML'24 Spo…
-
### Your current environment
Using latest available docker image: vllm/vllm-openai:v0.5.0.post1
### 🐛 Describe the bug
I am getting as response "Internal Server Error" when calling the /v1/embedd…
-
I am facing difficulties in specifying GPU usage for different models for LLM inference pipeline using vLLM. Specifically, I have 4 RTX 4090 GPUs available, and I aim to run a LLM with a size of 42GB …
-
### Your current environment
* vllm (commit `db2a6a41e206abecf4128aba25117fcaf7bebe12`) + ROCm 6.0 Docker image built with the [fix of Dockerfile.rocm](https://github.com/vllm-project/vllm/issues/386…
-
I plan to implement the function calling with vision models such as LLaVA and Nous-Hermes-2-Vision-Alpha based on the image, but it seems that the current implementation in the example folder only sup…
-
### 🚀 The feature, motivation and pitch
Claim major improvements over vllm. Unfortunately no code only the paper.
arxiv.org/abs/2405.04437
### Alternatives
_No response_
### Additional context
…
-
Somehow `max_prompt_len` may be 0 in this code: https://github.com/vllm-project/vllm/blob/264017a2bf030f060ebad91eb9be9b4e0033edb9/vllm/worker/model_runner.py#L232
```
| File "/usr/local/lib…
-
### Your current environment
Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果
### 🐛 Describe the bug
Qwen1.5-14B-Chat使用vllm==0.3.3版本在Tesla V100-PCIE-32GB显卡上部署结果全部是感叹号,无结果…