-
### Describe the issue
```python
from vllm import LLM, SamplingParams
from minference import MInference
prompts = [
"Hello, my name is",
"The president of the United States is",
…
-
fastchat不更新对新模型的支持吗?
-
I tried to run it on H100, but it seems there is an illegal memory access inside the kernel.
```
RuntimeError: CUDA error: an illegal memory access was encountered
```
-
Hi Do you support models throw vllm inference?
-
thank you for you great work. i met a question that the generation of a image is too slow, generate a 512x512 image cost almost one minute. so i want to know that can lumina-mgpt support inference spe…
-
Any data table for benchmark?
-
你好,在最近的测试中,我在A100上测试Llama-13b、7b等模型,对比vllm和distserve, 在满足slo的情况下, distserve性能要优于vllm,但是在测试codellama-34b过程中,当我的输入长度为8192,发现TTFT要高出vllm约3倍左右,请问这个情况是正常的吗?vllm使用tp2, distserve使用prefill tp2, decode tp2。
-
### The model to consider.
https://huggingface.co/openbmb/MiniCPM-V-2_6-int4
### The closest model vllm already supports.
_No response_
### What's your difficulty of supporting the model you want?…
-
### Your current environment
The output of `python collect_env.py`
```text
Your output of `python collect_env.py` here
```
### 🐛 Describe the bug
(base) bob@test-ESC8000A-E11:~$ python…
-
### 🚀 The feature, motivation and pitch
Hi, I'm currently working on **deploying vLLM distributed on multi-node in k8s cluster**. I saw that the official documentation provided a link by using [LWS…