-
# 🌟 New model addition
We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, r…
-
[paper](https://arxiv.org/pdf/2205.01917.pdf)
## TL;DR
**problem :** 좋은 vision backbone 만들기. 분류 레이블에 대한 이미지 프리트레이닝, 이미지-텍스트 pair를 받아 contrastive loss로 학습되는 dual-encoder model, image 인코더가 있고 …
-
# Reference
- 2021-01 A Survey on Visual Transformer [[Paper](https://arxiv.org/pdf/2012.12556.pdf)]
- [论文笔记 - 0809zheng](https://0809zheng.github.io/2021/02/10/visual-transformer.html)
# Brief
- …
-
The unified-io isi saga-cluster demo does more than baseline captioning, it also does object detection.
The task here would be to write a script (or otherwise implement a feature) to be included in t…
-
Hello! I have carefully read your code and paper (FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING), and I have found some inconsistencies that confuse me. I would appreciate it if you could give me some…
-
**Motivation**
Improve the benchmark performance of all algorithms based on TextOCR dataset released by Facebook AI research team
**Related resources**
https://textvqa.org/textocr
**Overvi…
-
# Summary
NLP 성능을 LLM 수준으로 유지시키면서 VLM을 scratch로 학습시키는 건 굉장히 어려움. 따라서, frozen pretrained language model로부터 어떤 식으로 VLM을 학습시키는지를 investigate하는 방향으로 연구가 진행되어 옴.
### 기존 연구 방향
1. Shallow alignmen…
-
### Question Validation
- [X] I have searched both the documentation and discord for an answer.
### Question
HI there!
I am curious on how to handle PowerPoints that contain images and gr…
-
Traceback (most recent call last):
File "test.py", line 77, in
scores = predict_captions(model, dict_dataloader_test, text_field)
File "test.py", line 26, in predict_captions
out, _ =…
-