-
### Describe the issue as clearly as possible:
I tried the implementation in the docs :
[https://dottxt-ai.github.io/outlines/latest/reference/models/transformers_vision/#classifying-an-image]
I…
-
Currently, we use text embeddings. This is fine for textual documents, while it present obvious drawbacks for documents containing non-textual content (images, graphs, schemes, …).
An alternative, is…
-
Could you add our CVPR 2024 paper about vision-language pertaining, "Iterated Learning Improves Compositionality in Large Vision-Language Models", into this repo?
Paper link: https://arxiv.org/abs/…
-
# Interesting papers
## Meta의 'An Introduction to Vision-Language Modeling'
- https://ai.meta.com/research/publications/an-introduction-to-vision-language-modeling/
![image](https://github.c…
-
- While the Latex env is not fully set, we'll write our thoughts here for now
----
- `ViperGPT` is a framework that leverages the pre-trained vision language models (`GLIP` for image object ground…
-
Dear Authors,
We'd like to add "GITA: Graph to Visual and Textual Integration for Vision-Language Graph Reasoning" to this repository, which has been accepted by NeurIPS 2024. [**Paper**](https:/…
-
### 📚 The doc issue
Is there any tutor for integrating the vision model with the language model?
### Suggest a potential alternative/fix
_No response_
-
Hey,
I am Zhiqiu Lin, a final-year PhD student at Carnegie Mellon University working with Prof. Deva Ramanan. Your work is very interesting with great performance gains!
I wanted to share [Natu…
-
I saw you used something like this:
```
model = FastVisionModel.get_peft_model(
model,
finetune_vision_layers = True, # False if not finetuning vision part
finetune_language_lay…
-
### Your current environment
# Initialize the LLaVA-1.5 model
llm = LLM(model="llava-hf/llava-1.5-7b-hf")
print(llm)
#embed_last_hook = Hook(model.language_model.model.norm) # for save embed
…