-
thanks for the great work. I was trying to reproduce your code, I noticed during pretraining, if you set the `mm_vision_output_token_count = 576` you will get:
```
File "llava-token-compression/ll…
-
llama-stack install from source:https://github.com/meta-llama/llama-stack/tree/cherrypick-working
### System Info
python -m "torch.utils.collect_env"
/home/kaiwu/miniconda3/envs/llama/lib/pytho…
-
- [ ] [DeepSeek-VL: Towards Real-World Vision-Language Understanding](https://arxiv.org/html/2403.05525v2)
# DeepSeek-VL: Towards Real-World Vision-Language Understanding
**Abstract**
We present De…
-
- [LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day](https://arxiv.org/abs/2306.00890)
- [MEDITRON-70B: Scaling Medical Pretraining for Large Language Models](http…
-
https://huggingface.co/blog/vision_language_pretraining
-
- [ ] [Title: "Yi Model Family: Powerful Multi-Dimensional Language and Multimodal Models"](https://arxiv.org/html/2403.04652v1)
# Title: "Yi Model Family: Powerful Multi-Dimensional Language and Mul…
-
Hi, the paper "Image as a Foreign Language: BEIT Pretraining for All Vision and Vision-Language Tasks" is really interesting, and the results are astonishing well, congratulations!
I am writing ju…
-
Pose a question about one of the following articles:
“[Online images amplify gender bias](https://www.nature.com/articles/s41586-024-07068-x),” 2024. Guilbeault, Douglas, Solène Delecourt, Tasker …
-
*Sent by Google Scholar Alerts (scholaralerts-noreply@google.com). Created by [fire](https://fire.fundersclub.com/).*
---
###
###
### [PDF] [Attention Prompting on Image for Large Vision-Language…
-
Hello! Thanks for the wonderful work and for sharing the pretrained weights.
In your ECCV work (`Making the Most of Text Semantics to Improve Biomedical Vision–Language Processing`), there are seve…