visual-captioning Search Results

473 results
for visual-captioning

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

huggingface/transformers #15813

Add OFA to transformers

# 🌟 New model addition We recently proposed OFA, a unified model for multimodal pretraining, which achieves multiple SoTAs on downstream tasks, including image captioning, text-to-image generation, r…

JustinLin610 updated 1 year ago
4
long8v/PTIR #35

[30] CoCa: Contrastive Captioners are Image-Text Foundation …

[paper](https://arxiv.org/pdf/2205.01917.pdf) ## TL;DR **problem :** 좋은 vision backbone 만들기. 분류 레이블에 대한 이미지 프리트레이닝, 이미지-텍스트 pair를 받아 contrastive loss로 학습되는 dual-encoder model, image 인코더가 있고 …

long8v updated 8 months ago
2
junxnone/tio #926

A Survey on Visual Transformer

# Reference - 2021-01 A Survey on Visual Transformer [[Paper](https://arxiv.org/pdf/2012.12556.pdf)] - [论文笔记 - 0809zheng](https://0809zheng.github.io/2021/02/10/visual-transformer.html) # Brief - …

junxnone updated 3 years ago
1
isi-vista/unified-io-inference #7

Run baseline captioning against one of the datasets identifi…

The unified-io isi saga-cluster demo does more than baseline captioning, it also does object detection. The task here would be to write a script (or otherwise implement a feature) to be included in t…

danielnapierski updated 1 year ago
2
milkymap/transformer-image-captioning #5

Some inconsistences between the paper and the code

Hello! I have carefully read your code and paper (FULL TRANSFORMER NETWORK FOR IMAGE CAPTIONING), and I have found some inconsistencies that confuse me. I would appreciate it if you could give me some…

Chenliu-svg updated 6 months ago
1
open-mmlab/mmocr #259

Performance on TextOCR Dataset

**Motivation** Improve the benchmark performance of all algorithms based on TextOCR dataset released by Facebook AI research team **Related resources** https://textvqa.org/textocr **Overvi…

jkcg-learning updated 3 years ago
6
YoojLee/paper_review #75

CogVLM: Visual Expert for Pretrained Language Models (2024)

# Summary NLP 성능을 LLM 수준으로 유지시키면서 VLM을 scratch로 학습시키는 건 굉장히 어려움. 따라서, frozen pretrained language model로부터 어떤 식으로 VLM을 학습시키는지를 investigate하는 방향으로 연구가 진행되어 옴. ### 기존 연구 방향 1. Shallow alignmen…

YoojLee updated 10 months ago
1
run-llama/llama_index #16693

[Question]: Ingesting Powerpoints with graphs and Images

### Question Validation - [X] I have searched both the documentation and discord for an answer. ### Question HI there! I am curious on how to handle PowerPoints that contain images and gr…

martinb-ai updated 1 month ago
3
aimagelab/meshed-memory-transformer #39

variance dtype issue

Traceback (most recent call last): File "test.py", line 77, in scores = predict_captions(model, dict_dataloader_test, text_field) File "test.py", line 26, in predict_captions out, _ =…

Bueno1887 updated 8 months ago
7
yangxuntu/SGAE #5

Can we evaluate on RAW images outside the coco dataset.(eval…

ravissj4 updated 3 years ago
3

上一页 1...1 2 3 4 5 6 7...48 下一页

473 results for visual-captioning

473 results
for visual-captioning