vision-language-pretraining Search Results

locuslab/llava-token-compression #6

pretraining Shape mismatch issue

thanks for the great work. I was trying to reproduce your code, I noticed during pretraining, if you set the `mm_vision_output_token_count = 576` you will get: ``` File "llava-token-compression/ll…

mzamini92 updated 19 hours ago

meta-llama/llama-stack #444

Context retrieval only works for first user message

llama-stack install from source:https://github.com/meta-llama/llama-stack/tree/cherrypick-working ### System Info python -m "torch.utils.collect_env" /home/kaiwu/miniconda3/envs/llama/lib/pytho…

wukaixingxp updated 1 week ago

irthomasthomas/undecidability #726

DeepSeek-VL: Towards Real-World Vision-Language Understandin…

- [ ] [DeepSeek-VL: Towards Real-World Vision-Language Understanding](https://arxiv.org/html/2403.05525v2) # DeepSeek-VL: Towards Real-World Vision-Language Understanding **Abstract** We present De…

irthomasthomas updated 8 months ago

Aidenzich/road-to-master #41

2024-03 Latest Health LLM

- [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890) - [MEDITRON-70B: Scaling Medical Pretraining for Large Language Models](http…

Aidenzich updated 8 months ago

canyon289/GenAiGuidebook #57

Visual Models

https://huggingface.co/blog/vision_language_pretraining

canyon289 updated 9 months ago

irthomasthomas/undecidability #769

Yi Model Family: Powerful Multi-Dimensional Language and Mul…

- [ ] [Title: "Yi Model Family: Powerful Multi-Dimensional Language and Multimodal Models"](https://arxiv.org/html/2403.04652v1) # Title: "Yi Model Family: Powerful Multi-Dimensional Language and Mul…

irthomasthomas updated 8 months ago

microsoft/unilm #943

Is there any updates on BEiT3?

Hi, the paper "Image as a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Tasks" is really interesting, and the results are astonishing well, congratulations! I am writing ju…

aereobert updated 1 year ago

Thinking-with-Deep-Learning-Spring-2024/Readings-Responses #16

Week 8. May. 10: Multi-Modal Learning - Possibilities

Pose a question about one of the following articles: “[Online images amplify gender bias](https://www.nature.com/articles/s41586-024-07068-x),” 2024. Guilbeault, Douglas, Solène Delecourt, Tasker …

JunsolKim updated 5 months ago

amir9979/reading_list #7077

Dave Van Veen - new related research

*Sent by Google Scholar Alerts (scholaralerts-noreply@google.com). Created by [fire](https://fire.fundersclub.com/).* --- ### ### ### [PDF] [Attention Prompting on Image for Large Vision-Language…

fire-bot updated 1 month ago

microsoft/hi-ml #894

Pretraining/training script, loss functions for hi-ml-multim…

Hello! Thanks for the wonderful work and for sharing the pretrained weights. In your ECCV work (`Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing`), there are seve…

santosh9sanjeev updated 10 months ago

196 results for vision-language-pretraining

196 results
for vision-language-pretraining