-
### Model description
Please add support for HuggingFaceM4/Idefics3-8B-Llama3 in tgi:
_Idefics3 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces t…
-
- [ ] [system-2-research/README.md at main · open-thought/system-2-research](https://github.com/open-thought/system-2-research/blob/main/README.md?plain=1)
# OpenThought - System 2 Research Links
He…
-
- [ ] [LLM-Agents-Papers/README.md at main · AGI-Edgerunners/LLM-Agents-Papers](https://github.com/AGI-Edgerunners/LLM-Agents-Papers/blob/main/README.md?plain=1)
# LLM-Agents-Papers
## :writing_hand…
-
Hi, congratulation for the results.
My questions are about a correct use of the exit features for the retrieval task and of the finetuning phase.
1) In the colab notebook, on the section 'Feature …
-
### Model description
Kosmos-2 is a grounded multimodal large language model, which integrates grounding and referring capabilities compared with Kosmos-1. The model can accept image regions select…
-
Hi! Thanks for releasing such impressive work! We find an interesting extension for this great work by combining **SoTA** zero-shot detector with Segment-Anything which can **generate high-quality box…
-
### Feature request
How can we take advantage of https://github.com/haotian-liu/LLaVA ?
https://llava-vl.github.io/
### Motivation
> LLaVA represents a novel end-to-end trained large multimodal …
-
# BLIP
* [paper](https://arxiv.org/abs/2201.12086)
* [code](https://github.com/salesforce/BLIP)
* [blog](https://blog.salesforceairesearch.com/blip-bootstrapping-language-image-pretraining/)
* i…
-
## Problem statement
1. Despite the impressive capabilities of large scale language models, the potential to modalities has not been fully demonstrated other than text.
2. Aligning parameters of vi…
-
This issue is for the notification of papers which will be added to this repo in the future