-
### 0.5b response is norm but 7b wrong
the same image,where i chage the code is` pretrained = "/home/shihongyu/MMLM_models/lmms-lab/llava-onevision-qwen2-7b-ov"
model_name = "llava_qwen"
device = "…
-
### System Info
In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in [Llava](https://github.com/huggingf…
-
Hello again!
Would it be possible to modify the GMP fine tune script to train a LoRA with PEFT for the CLIP VIT-G model? Then merge the LoRA with the model to get a new CLIP-G model?
Chat-GPT se…
-
Hi,
Trying to load model `all-mpnet-base-v2` with `map_device="auto"`.
Following [this closed issue](https://github.com/UKPLab/sentence-transformers/issues/2435) I understand that it is possibl…
-
Hi, I noticed that you submitted a paper titled “Masked Attention as a Mechanism for Improving Interpretability of Vision Transformers” to Medical Imaging with Deep Learning 2024. Do you plan to integ…
-
- [Dilated Neighborhood Attention Transformer](https://arxiv.org/abs/2209.15001)
- [Neighborhood Attention Transformer](https://arxiv.org/abs/2204.07143)
- [Stand-Alone Self-Attention in Vision Mode…
-
### Model description
Dear huggingface team,
The fair team published an improved version of dinov2 [VISION TRANSFORMERS NEED REGISTERS](https://arxiv.org/abs/2309.16588). The models and checkpoi…
-
### Feature request
Currently, if fp16 is used with grounding dino via https://huggingface.co/docs/transformers/main/en/model_doc/grounding-dino, there is an error of the following:
```
...
Fi…
-
### Model description
hey there! Was looking to use nomic-ai/nomic-embed-vision-v1.5 since I'm using the text version so I could support image / text queries using the same semantic space, but gettin…
-
- I am trying to run inference with Cambrian-1-34B.
- I have RTX 6000 GPUs with 48GBs.
- I following [this inference script](https://github.com/cambrian-mllm/cambrian/blob/main/inference.py).
The…