-
Hi, thank you for the wonderful paper and codebase! I had one clarification question: it looks like there is an extra set of forward passes for the SigLIP ViT blocks - is this intentional for the sigl…
-
Hi, I notice llava-next has published new version of llava-next-video model with llava-qwen and siglip vision tower, I wonder do have plan to support siglip in sglang? Thanks~
-
May I ask if the visual module is integrated from siglip-so400m-14-980-flash-attn2-navit?
The maximum resolution supported by the original Siglip is 980, why does Minicpmv2.5 only support a single bl…
-
![image](https://github.com/user-attachments/assets/c800f52d-04cd-47f1-9a41-2f240bef00e9)
我对代码做了以下修改
![image](https://github.com/user-attachments/assets/9ba72d3f-9c3b-4a32-9968-1a804089598c)
…
-
Hello!
I'm new to the MLX ecosystem, and I came across the fact that there is a valid CLIP implementation available in the repository. Keeping in mind that [SigLIP](https://arxiv.org/abs/2303.15343…
-
### Feature request
Add support for export SigLIP models
### Motivation
As used by many SOTA VLMs, SigLIP is gaining traction and supporting it can be the step 1 to supporting many VLMs.
### Your …
-
In the 3rd stage, following your paper 'Finally, we further perform instruction tuning of the pre-trained model on visual language instruction datasets,' I'm wondering if siglip is also unfrozen and i…
-
Hi, thanks for your work firstly.
while I find that I use the same picture as input of torch_model and llama.cpp_model, the res_(1, 96, 4096) of siglip+resampler part is different, the Cosine Similar…
-
首先感谢您的精彩工作。
目前我正在基于tinyllava模型利用tinychart数据复现训练流程,但是我发现bczhou/TinyLLaVA-3.1B-SigLIP中的visual encoder的image_size是384,vit_add_tome.py会将config中的image_size改成768。
因此在模型初始化时会基于image_size=768初始化sigLIP的p…
-
Greetings.
Would like to ask two questions about TinyLLaVA-0.55B (With OpenELM-270M-Instruct):
1) From config.json provided in TinyLLaVA 0.55B's hf repo, seems it is using OpenELM-450M
2) The repo …