facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.25k stars 905 forks source link

A question about DINOHead #244

Open wenhaoli-xmu opened 1 year ago

wenhaoli-xmu commented 1 year ago

背景(Background)

文章提到,随着DINOhead中MLP层数的增加,模型的效果会变得更好。 The article mentioned that as the number of MLP layers in DINOhead increases, the effect of the model will become better.

而使用l2 normalization是为了让MLP层数增加的同时更加稳定。 The use of l2 normalization is to make the MLP layer more stable while increasing the number of layers.

问题(Question)

  1. 问题1:那么跟在 l2 normalization 后面的 linear projection 是干什么用的? Question 1: So what is the linear projection following l2 normalization for?

  2. 问题2:为什么这个 linear projection 要使用 weight normalization Question 2: Why does this linear projection use weight normalization

image

ttkxyy commented 1 year ago

您弄清楚出了吗?我也有同样的疑惑

H-Jamieu commented 2 months ago

I may be wrong but I will try to answer the questions. Q1: Make the output K-dimension. A K-dimsional embedding is used for computing the loss in the original work.

Q2: Not sure, but according the ref No. 61 of the paper, may be simply to make the network to be trained...faster?