-
# Interesting papers
## Meta의 'An Introduction to Vision-Language Modeling'
- https://ai.meta.com/research/publications/an-introduction-to-vision-language-modeling/
![image](https://github.c…
-
Could you add our CVPR 2024 paper about vision-language pertaining, "Iterated Learning Improves Compositionality in Large Vision-Language Models", into this repo?
Paper link: https://arxiv.org/abs/…
-
- While the Latex env is not fully set, we'll write our thoughts here for now
----
- `ViperGPT` is a framework that leverages the pre-trained vision language models (`GLIP` for image object ground…
-
I saw you used something like this:
```
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = True, # False if not finetuning vision part
finetune_language_lay…
-
同学你好,非常感谢你对这一系列论文的整理和梳理,真的帮助很大!在阅读文献时注意到,仓库中部分标注为“2024-NeurIPS”的论文是“2023-NeurIPS”。以下是我发现的相关论文列表,供参考:
2023-NeurIPS:[Enhancing Adversarial Contrastive Learning via Adversarial Invariant Regularizatio…
-
Dear authors,
@shuyansy @UnableToUseGit
I kindly think you need to discuss VoCo-LLaMA[1] in the "Intro" section of your paper at the very least.
As I find the citation and discussions related to …
-
Dear shikiw,
Thank you for your valuable effort in curating research on MLLM hallucination! This excellent repository is impressively comprehensive and provides researchers with a clear sense of th…
-
qwen2-vl has always been memory hungry (compared to the other vision models) and even with unsloth it still OOMs when the largest llama3.2 11b works fine.
I'm using a dataset that has high resolution…
-
- Here's the summary of consulting a LLM specialist:
---
- We have an initial thought in #74 as follows:
![image](https://github.com/user-attachments/assets/265a3d7d-0454-4e7b-9c99-a0dd9f9ecf7c…
-
I have been doing a project of my own lately, and I need to get my hands on the Amharic vision large language model. In doing so, I came across your project. The problem is, however, it's bit enigmati…