-
### Anything you want to discuss about vllm.
This document includes the features in vLLM's roadmap for Q3 2024. Please feel free to discuss and contribute, as this roadmap is shaped by the vLLM com…
-
### Describe the bug
Custom model name is not picked up.
WARNING:langfuse:Langfuse was not able to parse the LLM model. The LLM call will be recorded without model name. Please create an issue so we…
-
ModuleNotFoundError: No module named 'vllm.engine.ray_utils'
Please tell me vllm version,thanks
-
## Description
I would like to inquire if there are any plans to support more configuration settings for vLLM, specifically related to RoPE scaling and theta adjustments.
## Background
vLLM curre…
-
### System Info
- nvidia:535.129.03
- cuda_version:12.4
- GPU:L40S
- OS:Ubuntu 22.04.4 LTS(docker)
- tensorrt-llm: 0.11.0.dev2024060400
### Who can help?
_No response_
### Information
…
-
as title
-
### Add newest GPUs cards:
- h100
- h200?
- a100
- l40s
### Modify Huggingface configuration handling:
- Instead of storing the Huggingface config locally, gather them from an API call.
###…
-
Thank you for your impressive work on this project. I'm eager to try this model, but I've noticed that the `vllm` deployment [pull request](https://github.com/vllm-project/vllm/pull/4650) has conflict…
-
When I use Qwen 2.0 72B Chat AWQ with latest vLLM, after the client initiates a OpenAI compatible request, there is a probability that the model will get stuck in an infinite loop, continuously consum…
-
### Area(s)
area:gen-ai
### What happened?
## Description
There is a PR trying to enable vllm support metrics as well and it is adopting this semantic convention as well https://github.com…