NorbertZheng / read-papers

My paper reading notes.
MIT License
8 stars 0 forks source link

Sik-Ho Tang | Review -- CaiT: Going Deeper with Image Transformers. #138

Closed NorbertZheng closed 11 months ago

NorbertZheng commented 11 months ago

Sik-Ho Tang. Review — CaiT: Going Deeper with Image Transformers.

NorbertZheng commented 11 months ago

Overview

Going Deeper with Image Transformers, CaiT, by Facebook AI, and Sorbonne University, 2021 ICCV, Over 100 Citations, Image Classification, Transformer, Vision Transformer, ViT

NorbertZheng commented 11 months ago

Deeper Image Transformers with LayerScale

image From (a) ViT, to (d) ViT Using Proposed LayerScale.

NorbertZheng commented 11 months ago

LayerScale offers more diversity in the optimization than just adjusting the whole layer by a single learnable scalar.

NorbertZheng commented 11 months ago

Specializing Layers for Class Attention

image CLS Token Places and Interactions.

NorbertZheng commented 11 months ago

Experimental Results

LayerScale

image Improving convergence at depth on ImageNet-1k.

LayerScale outperforms other weighting variants and baselines.

NorbertZheng commented 11 months ago

Class-Attention Stage

image Variations on CLS with DeiT-Small (no LayerScale).

Using late CLS insertion obtains better results.

With class-attention stage, further improvement is observed.

NorbertZheng commented 11 months ago

Cait Model Variants

image CaiT Model Variants.

CaiT model variants are constructed from XXS-24 to M-36.

NorbertZheng commented 11 months ago

SOTA Comparison

image SOTA Comparison.

CaiT can go deeper with better performance.

CaiT obtains higher accuracy compared with others.

image Results in transfer learning.

CaiT obtains better performance after fine-tuned to downstream tasks. image Ablation path from DeiT-S to our CaiT models.

Other than CaiT techniques, techniques from other papers, such as distillation in DeiT, are also used.

image Illustration of the regions of focus of a CaiT-XXS model, according to the response of the first class-attention layer (Some of them are shown here only).

NorbertZheng commented 11 months ago

Reference