-
This issue tracks the progress on improving the handling and testing of Vision-Language Models. The main goals are to enhance/enable generation tests, handle other generation techniques like assisted …
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
I want to use "Qwen/Qwen2-VL-2B-Instruct" on my multimodal rag app. I tried OllamaMultiM…
-
### Checklist
- [X] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.…
-
When it will be possible to fine-tune Qwen2-VL (or other VLMs) using unsloth? :)
-
Since we have now supported the multi-turn benchmark MMDU, we would like to implement the `chat_inner` function for existing VLMs in VLMEvalKit add support for multi-turn chatting.
Currently, we hav…
-
**Is your feature request related to a problem? Please describe.**
Vision Language Models are useful for understanding based on images. In robotics, the environment is dynamic and images from camera …
-
Hi,
Thank you for your great work!
I've been trying to use the Phi-3-Instruct-4B VLM models, but encountered several issues:
- Incorrect LLM backbone choice in phi.py:
https://github.com/R…
-
### Model description
Hi! I'm the author of ["Prismatic VLMs"](https://github.com/TRI-ML/prismatic-vlms), our upcoming ICML paper that introduces and ablates design choices of visually-conditioned …
siddk updated
4 months ago
-
For anyone that has gotten VLMs to work in fastchat. How did you do so? I cannot even pull any llava model from hugging face successfully. These have been my results so far:
```
python -m fastchat.s…
-
### Feature request
Add support for export SigLIP models
### Motivation
As used by many SOTA VLMs, SigLIP is gaining traction and supporting it can be the step 1 to supporting many VLMs.
### Your …