SwinTransformer / Feature-Distillation

MIT License
235 stars 11 forks source link

Where do the properties come from? #5

Open jsrdcht opened 2 years ago

jsrdcht commented 2 years ago

The core argument of your article is that after feature distillation the pre-trained model exhibits properties similar to those of the MIM model. But the problem is where do the properties come from?

For instance, MIM models have the locality because of the masked modeling mechanism. Otherwise, you use the same augmentation view for the teacher and student. So it's quite confusing where these properties come from.