vision-and-language-pre-training Search Results

688 results
for vision-and-language-pre-training

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

DataLama/papers #1

A Survey of Transformers (2021)

# 📜 [A Survey of Transformers](https://arxiv.org/pdf/2106.04554.pdf) ### ⚡ 한줄요약 2021년 6월 기준으로 정리한 transformer 아키텍쳐에 대한 서베이 논문. ### 🏷️ Abstract > Transformers have achieved great success in …

DataLama updated 3 years ago
2
salesforce/LAVIS #344

Can we have the weights for BLIP2 aligned with vicuna 7b, be…

As titled. Thank you !!!

gordonhu608 updated 10 months ago
10
NVlabs/RADIO #15

Problems of using RADIO in LLAVA setting.

First thanks for your great job! Now We're trying to replace the vision encoder in llava, i.e., clip-l-336, with RADIO. Under the default LLaVA 1.5 settings, we pretrain a multimodal projection MLP a…

weiwei0224 updated 5 months ago
3
mindee/doctr #1699

Multilingual support

### 🚀 The feature Support of multiple languages (accordingly VOCABS["multilingual"]) by pretrained models. ### Motivation, pitch It would be great to use models which supports multiple languages b…

decadance-dance updated 1 week ago
16
apple/coremltools #1776

Is it possible for the coremltools project to add support fo…

## ❓Question Hi there, I recently created a [basic implement](https://github.com/lishicheng1996/coremltools/tree/paddle_frontend) for converting a [PaddlePaddle](https://github.com/PaddlePaddle/…

lishicheng1996 updated 1 year ago
2
lucidrains/x-clip #1

Suggest your favorite papers to add!

will start with 1. FILIP https://arxiv.org/abs/2111.07783 2. CLOOB https://arxiv.org/abs/2110.11316 3. https://arxiv.org/abs/2110.05208

lucidrains updated 2 years ago
20
YoojLee/paper_review #75

CogVLM: Visual Expert for Pretrained Language Models (2024)

# Summary NLP 성능을 LLM 수준으로 유지시키면서 VLM을 scratch로 학습시키는 건 굉장히 어려움. 따라서, frozen pretrained language model로부터 어떤 식으로 VLM을 학습시키는지를 investigate하는 방향으로 연구가 진행되어 옴. ### 기존 연구 방향 1. Shallow alignmen…

YoojLee updated 8 months ago
1
arXiv/html_feedback #2309

Extreme narrow layout produced (nicematrix.sty)

### Description Extreme narrow layout produced in normal body ### (Optional:) Please add any files, screenshots, or other information here. _No response_ ### (Required) What is this issue most clo…

erkinalp updated 2 weeks ago
2
vllm-project/llm-compressor #687

SmoothQuant doesn't respect ignored modules for VLMs

I am trying to apply SmoothQuant during W8A8 quantization of `meta-llama/Llama-3.2-11B-Vision-Instruct` where I ignore all of the modules except for language_model. However I find that it crashes when…

mgoin updated 1 week ago
1
YicongHong/Thinking-VLN #1

Why not let agent to learn from a easier way

Thanks for your amazing sharing. I am a novice to VLN, but still motivated by your ideas. I notice that it is inevitable for an agent to make mistakes, which come mainly from the mismatching of sub-in…

Chenfeng1271 updated 2 years ago
3

上一页 1...1 2 3 4 5 6 7...69 下一页

688 results for vision-and-language-pre-training

688 results
for vision-and-language-pre-training