chaos-moon / paper_daily

One paper a day, keep laziness away.
MIT License
6 stars 3 forks source link

DINO系列-MetaAI #13

Open zc12345 opened 1 year ago

zc12345 commented 1 year ago

DINOv1: Emerging Properties in Self-Supervised Vision Transformers

DINO 最大的卖点是发现了 ViT 在自监督训练下有一些很有趣的特性:在没有任何标签的情况下,直接将自注意力图可视化,发现能够很好地抓取物体轮廓,甚至媲美直接做分割。本文探究自监督预训练对 ViT feature 的影响。

method

DINO DINO: Self-distillation with no labels

BYOL BYOL architecture

tricks

## Algorithm 1 DINO PyTorch pseudocode w/o multi-crop. 
# gs, gt: student and teacher networks 
# C: center (K) # tps, tpt: student and teacher temperatures 
# l, m: network and center momentum rates 
gt.params = gs.params 
for x in loader: # load a minibatch x with n samples 
    x1, x2 = augment(x), augment(x) # random views 

    s1, s2 = gs(x1), gs(x2) # student output n-by-K 
    t1, t2 = gt(x1), gt(x2) # teacher output n-by-K 

    loss = H(t1, s2)/2 + H(t2, s1)/2 
    loss.backward() # back-propagate 

    # student, teacher and center updates 
    update(gs) # SGD 
    gt.params = l*gt.params + (1-l)*gs.params 
    C = m*C + (1-m)*cat([t1, t2]).mean(dim=0) 

def H(t, s): 
    t = t.detach() # stop gradient 
    s = softmax(s / tps, dim=1) 
    t = softmax((t - C) / tpt, dim=1) # center + sharpen 
    return - (t * log(s)).sum(dim=1).mean()

思考

zc12345 commented 1 year ago

DINOv2: Learning Robust Visual Features without Supervision

overview

TL;DR

相对于DINO

method

related work

contribution

问题

思考