-
How about add Visual Question Answering to hezar ?
I saw a few days ago that we have a visual question answering benchmark for persian and i thought its nice to have vqa in hezar
I would like also …
-
**Name of the feature**
*In general, the feature you want added should be supported by HuggingFace's [transformers](https://github.com/huggingface/transformers) library:*
- *If requesting a **model…
-
**System Information (please complete the following information):**
Windows OS: Windows-11-Enterprise-24H2
ML.Net Model Builder 2022: 17.19.0.2455701 (Main Build)
Microsoft Visual Studio Enterprise: 2…
-
# 一言でいうと
[こちら](https://visualqa.org/index.html)のVQAタスクの提案とベースラインモデルの紹介論文。
# 論文リンク
http://arxiv.org/abs/1505.00468.pdf
# 著者/所属機関
Aishwarya Agrawal, Jiasen Lu, Stanislaw Antol, Margaret Mitchel…
-
@NielsRogge Is Batch inferencing possible in the LayoutLMv2 VQA task?
Currently, I have observed that on Colab GPU, an inference on a single question takes around 0.2-0.3 seconds. In the below ste…
-
### Metadata
- Authors: Kushal Kafle and Christopher Kanan
- Organization: Chester F. Carlson Center for Imaging Science Rochester Institute of Technology
- Paper: https://arxiv.org/pdf/1610.01465.…
-
-
Enjoying the recent gradio notebook stuff!
Was curious about when/if support for an additional hugging face task option of ["visual question answering“](https://huggingface.co/models?pipeline_tag=…
-
#### Is your feature request related to a problem? Please describe.
When creating a text question, can admins have the choice between `` or ``?
Usually the `` field is great since it's a visual …
-
## 一言でいうと
画像を見て質問に答えるタスクに対し、Graph Convolutionを使う手法。検知したオブジェクトの画像特長と質問文の特徴を結合したものをノードとし、ノードの接続は画像特長の位置を基に行う。これにより、物体間の位置関係を質問文のコンテキストで把握することを狙っている。VQA-v2でSOTA。
![image](https://user-images.github…