-
### Proposal to improve performance
Hi~ I find the inference time of Qwen2-VL-7B AWQ is not improved too much compared to Qwen2-VL-7B. Do you have any suggestions about improving performance. Thank y…
-
The documentations didnt help at all, im using xiaomi miio auto integration, I don't understand how the room cleaning scripts work.
Like on the card we have this:
shortcuts:
- name: Clean li…
-
### Anything you want to discuss about vllm.
https://github.com/vllm-project/vllm/pull/8797
### Before submitting a new issue...
- [x] Make sure you already searched for relevant issues, and asked …
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
Which branch should I use to test speculative decoding, and which branch curren…
v-lmn updated
2 weeks ago
-
### Your current environment
vLLM version: v0.6.3.post1
### 🐛 Describe the bug
In the latest version v0.6.3.post1, when generating long texts (for example, when the number of tokens reaches 2…
-
### 🚀 The feature, motivation and pitch
I noticed that when kicking off the vllm api server I am limited to a single secret key. I have multiple users and I would like to have the ability to drop the…
-
### Your current environment
```text
The output of `python collect_env.py`
```
### How would you like to use vllm
I want to run inference of a [specific model](put link here). I don't know how …
-
### The problem
Created a very simple Template switch through the GUI which evaluates an entity state before setting the power of a heater (scene activate) - it does not evaluate the conditions corre…
-
### Description
I discovered a mismatch between the GUI and the modes supported by Better Thermostat, probably only when there's a cooling device attached.
The GUI in question is the default GUI…
-
### Your current environment
I'm encountering an issue with the LLaMA 3.1 8B model while using the HPU Docker image. The maximum context length I'm able to input is around 30k tokens, despite the m…