-
-
Running vllm according to instructions. Docker segfaults at startup, so I'm running straight on the machine.
Starting server with the following shell script. As you can see I've tried to turn max…
-
环境:
部署了一个基于 qwen2 72B的vllm openai api server
命令:
llmuses perf --url 'http://127.0.0.1:8000/v1/chat/completions' --parallel 4 --model '/share/modelscope/hub/qwen/Qwen2-72B-Instruct-FP8' --log-eve…
-
### Your current environment
Why in this line needs -1 ?
https://github.com/vllm-project/vllm/blob/main/vllm/core/block_manager_v1.py#L667)
### How would you like to use vllm
_No response_
-
### Your current environment
vllm==0.4.3
numpy==1.26.4
nvidia-nccl-cu12==2.20.5
torch==2.3.0
transformers==4.41.2
triton==2.3.0
### 🐛 Describe the bug
I don't know if this is a bug or …
-
Please add one or more params to control logs from RESTful API server - namely in `mii.serve()` function.
You can see as reference `-log-` config params in vLLM: https://docs.vllm.ai/en/latest/servin…
-
### Anything you want to discuss about vllm.
For gptq_marlin, `min_thread_n=64 min_thread_k=64` is required in [https://github.com/vllm-project/vllm/blob/70c232f85a9e83421a4d9ca95e6384364271f2bc/csrc…
-
### Model/Pipeline/Scheduler description
Lumina-T2X is a text-to-any generation model. Our model is capable of generating multiple modalities, most notably image generation. Currently, our image ge…
-
Hi There,
I found openAI() takes base_url as the mandatory argument to initialize which is mentioned in this vLLM documentation.
[https://docs.vllm.ai/en/latest/getting_started/quickstart.html#usi…
-
Hi, I have tried to load the Phi3 Medium model (128k), but it fails to work with the current version of VLLM here, is this a version update issue? and when I try the Phi3 Mini 128k, it at least tries …
rkyla updated
2 weeks ago