bigshanedogg / survey

2 stars 0 forks source link

[DeCLIP] Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm #17

Open bigshanedogg opened 2 years ago

bigshanedogg commented 2 years ago

Problem statement

  1. 사전학습에 400M pairs를 사용해야 하는 CLIP을 조금 더 data-efficient하게 학습시켜보자

Baseline

Data details

name abbr type format source size description remark related tasks
Conceptual Captions CC3M image (image, caption) 3M image-text pretraining
Conceptual 12M CC12M image (image, caption) 12M image-text pretraining
YFCC15M image 15M image-text pretraining
DECLIP WEB-CRAWLED DATA image 59M image-text pretraining
ImageNet image (image, class) classification, captioning
Pets image downstream transferability classification
CIFAR10 image downstream transferability classification
CIFAR100 image downstream transferability classification
SUN image downstream transferability classification
Food101 image downstream transferability classification
Flowers image downstream transferability classification
Caltech image downstream transferability classification
Aircraft image downstream transferability classification
DTD image downstream transferability classification

Approach

Screen Shot 2022-09-01 at 2 29 10 PM

Evaluation

image

Limitations

bigshanedogg commented 2 years ago

2110.05208.pdf