-
![Screenshot 2024-08-10 at 2 47 41](https://github.com/user-attachments/assets/0371cb82-2134-4a74-9b48-36727d3b263d)
masa8 updated
3 weeks ago
-
In your paper you said that you use ViT-B-16 siglip model with webli weights.
I try to use ViT-B-16-SigLIP webli but get this error^
AttributeError: 'TimmModel' object has no attribute 'transformer'…
-
Hi, thank you for the wonderful paper and codebase! I had one clarification question: it looks like there is an extra set of forward passes for the SigLIP ViT blocks - is this intentional for the sigl…
-
Getting ValueError: Unknown vision tower: google/siglip-so400m-patch14-384 on running https://github.com/LLaVA-VL/LLaVA-NeXT/blob/5fbcf27e32935f4e09d6b8b9f8abed4a572240b0/docs/LLaVA_OneVision_Tutorial…
-
Hi,
Thanks for your great work. I was trying to apply siglip loss for training contrastive models. However, I find the loss scale is quiet small, usually around 0.003 at the begging. I wonder if an…
-
Hi,
Thank you for sharing the Mantis source code.
I trained your LLaMA3 model with SigLIP on my dataset. The model saves a checkpoint every 500 steps. I would like to merge the LoRA weights from…
-
https://github.com/vllm-project/vllm/releases
* incorporate --num-scheduler-steps 8
* test openai tool spec
* test multi multi modal input support
* test tensor parallel for ViT models in multi …
-
Hello, Where can I find the pretrained vision tower, the original one seems to have been fine-tuned? Such as SigLip, CLIP
-
Hi, would love to see some comparsion to the SOTA vit models such as InternVit and Siglip etc, especially for the Chinese version.
-
### The model to consider.
https://huggingface.co/lmms-lab/llava-onevision-qwen2-7b-ov
There are a bunch of others using the same architecture.
### The closest model vllm already supports.
qwen2…