Muzammal-Naseer / IPViT

Official repository for "Intriguing Properties of Vision Transformers" (NeurIPS 2021--Spotlight)
176 stars 19 forks source link

Question about links of pretrained models #5

Closed ZhouqyCH closed 2 years ago

ZhouqyCH commented 2 years ago

Hi! First of all, thank the authors for the exciting work! I noticed that the checkpoint link of the pretrained 'deit_tiny_distilled_patch16_224' in vit_models/deit.py is different from the one of the shape-biased model DeiT-T-SIN (distilled), as given in README.md. I thought deit_tiny_distilled_patch16_224 has the same definition with DeiT-T-SIN (distilled). Do they have differences in model architecture or training procedure?

Muzammal-Naseer commented 2 years ago

Hi ZhouqyCH,

Thank you for your interest in our work. These models differ in their training mechanism as discussed in the paper. Their architecture definitions are the same.

Can you pls provide more details on the exact issue?

ZhouqyCH commented 2 years ago

Hi ZhouqyCH,

Thank you for your interest in our work. These models differ in their training mechanism as discussed in the paper. Their architecture definitions are the same.

Can you pls provide more details on the exact issue?

Thank you for the reply! I think I've understood the differences of them. The checkpoint link of the pretrained 'deit_tiny_distilled_patch16_224' is given by the repository of DeiT. So the corresponding model distills knowledge from ImageNet while the DeiT-T-SIN (distilled) distills knowledge, more precisely, the shape knowledge, from SIN. Right?

Muzammal-Naseer commented 2 years ago

Yes, that's right.