-
### Your current environment
I install using docker swarm on dedicated cloud vps on hetzner, I want run a lightweight model "jinaai/jina-embeddings-v3", I assume that the cpu and ram i sufficient in …
-
### Describe the bug
making a custom chatbot component using multimodal chatbot and latest @gradio npm install, and the normalise_file function that is being used doesn't exist
### Have you searched…
-
### Describe the bug
![image](https://github.com/user-attachments/assets/a8d4a6ee-db42-4892-9eba-db08b8418601)
计算公式
分辨力 ( r ):
[
r = \frac{1}{\bar{U}} \times 100%
]
非线性度 ( \delta_1 ):
[
\delt…
-
### Your current environment
I am running vllm serve with a multimodal (Phi3.5K). How to I run benchmark_serving.py to test the multimodal?
In benchmark_serving.py file I see following but test_mm…
-
### 🚀 The feature, motivation and pitch
I'm working on applications that must run locally in resource-limited HW. Threrefore, quantization becomes essential. Such applications need from multimodal vi…
-
### Your current environment
[pip3] numpy==1.25.1
[pip3] nvidia-cublas-cu12==12.4.5.8
[pip3] nvidia-cuda-cupti-cu12==12.4.127
[pip3] nvidia-cuda-nvrtc-cu12==12.4.127
[pip3] nvidia-cuda-runtime-…
-
### 🚀 The feature, motivation and pitch
- multimodal feature to benchmark offline latency, throughput and online serving for multimodal for pixtral
### Alternatives
- everyone writes thei…
-
### Anything you want to discuss about vllm.
in qwen2vl's mrope imple, vllm decide whether input positions is for multimodal with
![image](https://github.com/user-attachments/assets/6dfc96d9-5162-…
-
I think Intel CPUs/GPUs now support more efficient inference with OpenVINO. See example here with LLAVA: https://docs.openvino.ai/2023.2/notebooks/257-llava-multimodal-chatbot-with-output.html
It …
-
### Your current environment
vllm == 0.5.5.
### 🐛 Describe the bug
when we deploy the `microsoft/Phi-3.5-vision-instruct`,
it will randomly hit this issue.
```
[1;36m(VllmWorkerProcess p…