-
### 🚀 The feature
Add support for vision-language models like CLIP or LIT.
### Motivation, pitch
Dear torchvision team,
I am sorry if I missed discussions about this or a specific reason why you h…
-
### Motivation
Hi friends,
I'm opening this issue as a place to discuss small vision-language models, please share your thoughts below!
There's recently been great success in research with sm…
-
Hi,
I'm trying to constrain the generation of my VLMs using this repo; however i can't figure out the way to personalize the pipeline for handling inputs (query+image). Whereas it is documented as …
-
## LINKs
[[paper](https://arxiv.org/abs/2405.02246)](https://arxiv.org/abs/2405.02246)
[[models](https://huggingface.co/HuggingFaceM4/idefics2-8b)](https://huggingface.co/HuggingFaceM4/idefics2-8b)…
-
### System Info
In current implementation of VLMs, the "_supports_sdpa" attribute checks and activates SDPA attention only for the language model. For example in [Llava](https://github.com/huggingf…
-
### Checklist
- [X] 1. I have searched related issues but cannot get the expected help.
- [X] 2. The bug has not been fixed in the latest version.
### Describe the bug
Hi folks, thanks for t…
ghost updated
3 weeks ago
-
### This issue is for a: (mark with an `x`)
```
- [ ] bug report -> please search issues before submitting
- [X] feature request
- [ ] documentation issue or request
- [ ] regression (a behavior …
-
TRL SFTTrainer supports LLaVA (Large Language and Vision Assistant) as described in the following link [Vision Language Models Explained](https://huggingface.co/blog/vlms)
Is there any plan to rele…
-
A significant achievement in aligning Vision-Language Models!
While running the code 'RLAIF-V/muffin/train/train_llava15.py', I noticed that all model parameters are trainable. Due to hardware limi…
-
### Feature request
The developments in the robotics community around RT-2 show a lot of potential for VLMs but the hardware constraints for small developers makes it difficult to deploy RT-2 level p…
7uk3y updated
5 months ago