grounded-multimodal-model Search Results

huggingface/text-generation-inference #2503

Add support for Idefics 3

### Model description Please add support for HuggingFaceM4/Idefics3-8B-Llama3 in tgi: _Idefics3 is an open multimodal model that accepts arbitrary sequences of image and text inputs and produces t…

stelterlab updated 2 months ago

irthomasthomas/undecidability #940

open-thought/system-2-research

- [ ] [system-2-research/README.md at main · open-thought/system-2-research](https://github.com/open-thought/system-2-research/blob/main/README.md?plain=1) # OpenThought - System 2 Research Links He…

ShellLM updated 3 days ago

irthomasthomas/undecidability #951

LLM-Agents-Papers repo

- [ ] [LLM-Agents-Papers/README.md at main · AGI-Edgerunners/LLM-Agents-Papers](https://github.com/AGI-Edgerunners/LLM-Agents-Papers/blob/main/README.md?plain=1) # LLM-Agents-Papers ## :writing_hand…

ShellLM updated 2 days ago

salesforce/BLIP #25

Feature extraction for image retrieval and fine tuning

Hi, congratulation for the results. My questions are about a correct use of the exit features for the retrieval task and of the finetuning phase. 1) In the colab notebook, on the section 'Feature …

enrico310786 updated 2 years ago

huggingface/transformers #24671

Is there any plan to add kosmos-2 to the transformers.

### Model description Kosmos-2 is a grounded multimodal large language model, which integrates grounding and referring capabilities compared with Kosmos-1. The model can accept image regions select…

BIGBALLON updated 1 year ago

facebookresearch/segment-anything #74

[Extension Project] Generating Box Prompts with Zero-Shot De…

Hi! Thanks for releasing such impressive work! We find an interesting extension for this great work by combining **SoTA** zero-shot detector with Segment-Anything which can **generate high-quality box…

rentainhe updated 1 year ago

OpenAdaptAI/OpenAdapt #500

Explore LLaVA

### Feature request How can we take advantage of https://github.com/haotian-liu/LLaVA ? https://llava-vl.github.io/ ### Motivation > LLaVA represents a novel end-to-end trained large multimodal …

abrichr updated 1 year ago

chaos-moon/paper_daily #25

BLIP系列

# BLIP * [paper](https://arxiv.org/abs/2201.12086) * [code](https://github.com/salesforce/BLIP) * [blog](https://blog.salesforceairesearch.com/blip-bootstrapping-language-image-pretraining/) * i…

zc12345 updated 1 year ago

bigshanedogg/survey #21

[FROZEN] Multimodal Few-Shot Learning with Frozen Language M…

## Problem statement 1. Despite the impressive capabilities of large scale language models, the potential to modalities has not been fully demonstrated other than text. 2. Aligning parameters of vi…

bigshanedogg updated 2 years ago

shure-dev/Awesome-LLM-Papers-Comprehensive-Topics #2

Papers which will be added in the future

This issue is for the notification of papers which will be added to this repo in the future

shure-dev updated 7 months ago

52 results for grounded-multimodal-model

52 results
for grounded-multimodal-model