SwinTransformer / Feature-Distillation

MIT License
240 stars 11 forks source link

Was Whitening implemented incorrectly? #14

Open jsrdcht opened 1 year ago

jsrdcht commented 1 year ago

whitening operation, which is implemented by a non-parametric layer normalization operator without scaling and bias

You mentioned that the whitening operation is non-parametric. But it seems you implemented it by norm operation from the original paper which is not non-parametric.

if self.feat_after_norm:
      if 'CLIP' in self.pred_feat:
          x_tgt = self.feature_model.visual.ln_post(x_tgt)
yaoyuan10475 commented 1 year ago

I found that the author's teacher model output has gone through a norm(self.feature_model.norm(x_tgt)), and then there is self.ln_tgt(x_tgt). The output equivalent to the teacher model has gone through Layer Norm twice. I don't quite understand this.