Open zc12345 opened 1 year ago
blog
arxiv
video
code
DINO 最大的卖点是发现了 ViT 在自监督训练下有一些很有趣的特性:在没有任何标签的情况下,直接将自注意力图可视化,发现能够很好地抓取物体轮廓,甚至媲美直接做分割。本文探究自监督预训练对 ViT feature 的影响。
DINO
BYOL
## Algorithm 1 DINO PyTorch pseudocode w/o multi-crop. # gs, gt: student and teacher networks # C: center (K) # tps, tpt: student and teacher temperatures # l, m: network and center momentum rates gt.params = gs.params for x in loader: # load a minibatch x with n samples x1, x2 = augment(x), augment(x) # random views s1, s2 = gs(x1), gs(x2) # student output n-by-K t1, t2 = gt(x1), gt(x2) # teacher output n-by-K loss = H(t1, s2)/2 + H(t2, s1)/2 loss.backward() # back-propagate # student, teacher and center updates update(gs) # SGD gt.params = l*gt.params + (1-l)*gs.params C = m*C + (1-m)*cat([t1, t2]).mean(dim=0) def H(t, s): t = t.detach() # stop gradient s = softmax(s / tps, dim=1) t = softmax((t - C) / tpt, dim=1) # center + sharpen return - (t * log(s)).sum(dim=1).mean()
demo
BibTeX
DINOv1: Emerging Properties in Self-Supervised Vision Transformers
blog
]arxiv
]video
]code
]DINO 最大的卖点是发现了 ViT 在自监督训练下有一些很有趣的特性:在没有任何标签的情况下,直接将自注意力图可视化,发现能够很好地抓取物体轮廓,甚至媲美直接做分割。本文探究自监督预训练对 ViT feature 的影响。
method
DINO
BYOL
tricks
思考