-
Vision-language pre-training has significantly elevated performance across a wide range of image-language applications. Yet, the pre-training process for video-related tasks demands exceptionally larg…
-
Thanks for the awesome Grounding-DINO, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
* OV-DINO is a novel unified open vocabulary detecti…
-
- https://arxiv.org/abs/2109.12178
- 2021
視覚と言語の事前学習(VLP)は,画像やテキストの入力を必要とする下流のタスクのモデル性能を向上させる.
現在のVLPアプローチは、
(i)モデルアーキテクチャ(特に画像エンベッダー)、
(ii)損失関数、
(iii)マスキングポリシーによって異なります。
画像エンベッダーは、ResNet…
e4exp updated
2 years ago
-
Thanks for the awesome GLIP, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
* OV-DINO is a novel unified open vocabulary detection approac…
-
Thanks for the awesome YOLO-World, I share our recent work 🦖OV-DINO: Unified Open-Vocabulary Detection with Language-Aware Selective Fusion.
* OV-DINO is a novel unified open vocabulary detection …
-
- https://arxiv.org/abs/2103.06561
- 2021
近年、視覚と言語の橋渡しを目的としたマルチモーダルな事前学習モデルが盛んに研究されています。
しかし、これらのモデルの多くは、テキストと画像の間に強い相関関係があると仮定することで、画像とテキストのペアの間のクロスモーダルな相互作用を明示的にモデル化しています。
この強い仮定は実世界のシナリオでは無効で…
e4exp updated
3 years ago
-
[https://arxiv.org/pdf/2404.06512.pdf](https://arxiv.org/pdf/2404.06512.pdf)
[https://github.com/InternLM/InternLM-XComposer](https://github.com/InternLM/InternLM-XComposer)
### preview
- 건강검진 …
-
Thanks for your awesome work in model merging! I'm excited about the improvements you achieved compare to other merging methods. However, I saw the individually fine-tuned models still out-perform WEM…
-
As mentioned in the paper, you use 20% training data(around 16M*0.2 = 3.2M) to train the model. I have some questions about it.
Previously, the baseline model ABINet consists of three stages: vision …
-
### 需求
能否支持会议(CCF 会议)查询?
### 版本信息
zotero version: 6.0.27
zotero-updateifsE version: 0.13.0
### 已有材料
* ccf推荐国际学术刊物目录:https://www.ccf.org.cn/Academic_Evaluation/By_category/
* easySchol…