vision-and-language-pre-training Search Results

e4exp/paper_manager_abstract #653

MLIM: Vision-and-Language Model Pre-training with Masked Lan…

- https://arxiv.org/abs/2109.12178 - 2021 視覚と言語の事前学習（VLP）は，画像やテキストの入力を必要とする下流のタスクのモデル性能を向上させる．現在のVLPアプローチは、 (i)モデルアーキテクチャ（特に画像エンベッダー）、 (ii)損失関数、 (iii)マスキングポリシーによって異なります。画像エンベッダーは、ResNet…

e4exp updated 3 years ago

IDEA-Research/GroundingDINO #350

Try OV-DINO, a more powerful open-vocabulary detector.

Thanks for the awesome Grounding-DINO, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion. * OV-DINO is a novel unified open vocabulary detecti…

wanghao9610 updated 1 month ago

e4exp/paper_manager_abstract #288

WenLan: Bridging Vision and Language by Large-Scale Multi-Mo…

- https://arxiv.org/abs/2103.06561 - 2021 近年、視覚と言語の橋渡しを目的としたマルチモーダルな事前学習モデルが盛んに研究されています。しかし、これらのモデルの多くは、テキストと画像の間に強い相関関係があると仮定することで、画像とテキストのペアの間のクロスモーダルな相互作用を明示的にモデル化しています。この強い仮定は実世界のシナリオでは無効で…

e4exp updated 3 years ago

VDIGPKU/IterNet #4

In Ablation experiments, How much data has been utilized for…

As mentioned in the paper, you use 20% training data(around 16M*0.2 = 3.2M) to train the model. I have some questions about it. Previously, the baseline model ABINet consists of three stages: vision …

SoundingSilence updated 2 years ago

microsoft/GLIP #172

Try OV-DINO, a more powerful open-vocabulary detector.

Thanks for the awesome GLIP, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion. * OV-DINO is a novel unified open vocabulary detection approac…

wanghao9610 updated 3 months ago

AILab-CVC/YOLO-World #452

Try OV-DINO, a more powerful open-vocabulary detector.

Thanks for the awesome YOLO-World, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion. * OV-DINO is a novel unified open vocabulary detection …

wanghao9610 updated 3 months ago

alfworld/alfworld #87

Following expert actions doesn't always lead to finishing th…

Dear authors, Thanks for the amazing work. Recently I followed the expert actions that I extract from `get_info()` function from the class `AlfredThorEnv`, however, the success rate is only slight…

XiaofengLin7 updated 1 week ago

lf-energy/tac #260

GridFM

### Mission Statement Enable the emergence of foundation models for electrical grids. ### Description This project aims at developing foundation models for electrical grids (GridFMs). Foundat…

alexparisot updated 3 weeks ago

redleafnew/zotero-updateifsE #114

[Feature Request] 能否手动支持会议（CCF 会议）查询？

### 需求能否支持会议（CCF 会议）查询？ ### 版本信息 zotero version: 6.0.27 zotero-updateifsE version: 0.13.0 ### 已有材料 * ccf推荐国际学术刊物目录：https://www.ccf.org.cn/Academic_Evaluation/By_category/ * easySchol…

Yam0214 updated 5 months ago

AILab-CVC/YOLO-World #120

Empowering X-AnyLabeling with YOLO-World Model Support

Hi, @wondervictor, a huge shoutout for your remarkable contributions! I've seamlessly integrated YOLO-World into [X-AnyLabeling](https://github.com/CVHub520/X-AnyLabeling), marking a significant ad…

CVHub520 updated 8 months ago

696 results for vision-and-language-pre-training

696 results
for vision-and-language-pre-training