vl-bert Search Results - Githubissues

147 results
for vl-bert

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

emanjavacas/bntl #15

Title descriptions: our proposition

This is our proposition for a new item-notation for more uniformity, I hope this is (a little bit) clear to you! The items in the list are displayed as their N2-form in the RIS-code (= like in the …

evadecooman updated 1 month ago
3
e4exp/paper_manager_abstract #505

Incorporating Visual Layout Structures for Scientific Text C…

- https://arxiv.org/abs/2106.00676 - 2021 科学論文のタイトル，著者名，本文などを分類することは，科学文書を自動で理解するための重要な第一歩です．これまでの研究では、各トークンのページ上の2次元的な位置などの基本的なレイアウト情報を用いることで、より正確な分類が可能になることが示されています。本研究では、言語モデルにVIsual LAyout…

e4exp updated 3 years ago
2
e4exp/paper_manager_abstract #287

Perspectives and Prospects on Transformer Architecture for C…

- https://arxiv.org/abs/2103.04037 - 2021 トランスフォーマーアーキテクチャは、長年リカレントニューラルネットワークに支配されていた計算言語学の分野に根本的な変化をもたらしました。その成功は、言語と視覚のクロスモーダルなタスクにも劇的な変化をもたらし、多くの研究者がすでにこの問題に取り組んでいます。本論文では、この分野における最も重要なマイル…

e4exp updated 3 years ago
7
chenwei746/EEVG #3

你好，请教一下swin-transformer部分代码

请问如果想把swin-b改成swin-s的话，是不是只要 arg = dict(pretrain_img_size=384, window_size=12, embed_dim=128, out_indices=[2, 1], depths=[2, 2, 18, 2], num_heads=[4, 8, 16, 32]) 中dict这一块改成 dict(pre…

Mr-Bigworth updated 2 months ago
2
OpenGVLab/InternVideo #129

[Help requested] Inference InternVideo2_clip model.

Hello InternVideo team, You guys have done a great job with this project! In your paper, you use the Stage 2 model for the task of temporal grounding on QVHighlight [Lei et al., 2021] and Charad…

gracikk-ds updated 4 days ago
36
e4exp/paper_manager_abstract #562

Probing Inter-modality: Visual Parsing with Self-Attention f…

- https://arxiv.org/abs/2106.13488 - 2021 視覚言語の事前学習（VLP）は、画像とテキストのペアからマルチモーダルな表現を学習し、下流の視覚言語タスクのために微調整を行うことを目的としています。一般的なVLPモデルは、CNN-Transformerアーキテクチャを採用しており、画像をCNNで埋め込み、画像とテキストをTransformerで整列さ…

e4exp updated 3 years ago
2
e4exp/paper_manager_abstract #343

Seeing Out of tHe bOx: End-to-End Pre-training for Vision-La…

- https://arxiv.org/abs/2104.03135 - CVPR 2021 本研究では、畳み込みニューラルネットワーク（CNN）とトランスフォーマー（Transformer）の共同学習により、何百万もの画像とテキストのペアからクロスモーダルな位置合わせを学習することを目的とした視覚言語事前学習（VLPT）を研究しています。従来の手法では、画像の顕著な領域を抽出し、その…

e4exp updated 3 years ago
6
artidoro/qlora #145

Loading Lora Adapter weights into 4bit model to continue fin…

In [this colab](https://colab.research.google.com/drive/17XEqL1JcmVWjHkT-WczdYkJlNINacwG7?usp=sharing#scrollTo=2QK51MtdsMLu) you show how to load adapter and merge it with initial model. Notice it loa…

simsim314 updated 1 month ago
5
microsoft/unilm #1333

[Kosmos-2] Unable to start the demo

First of all, thank you for sharing the awesome code. After setting everything up, when I tried to launch the demo, I encountered the following error. Please help me. ``` (kosmos-2) wendell@:~/…

wendellgithub0206 updated 10 months ago
7
chenwei746/EEVG #2

Replicating results

Hello and thank you for your work. I am interested in replicating your results on the RIS task, with SwinB backbone in particular. I noticed that you only report the command for evaluating with ViTD…

ClaudiaCuttano updated 2 months ago
1

上一页 1...1 2 3 4 5 6 7...15 下一页

147 results for vl-bert

147 results
for vl-bert