Closed ZhouqyCH closed 2 years ago
Hi ZhouqyCH,
Thank you for your interest in our work. These models differ in their training mechanism as discussed in the paper. Their architecture definitions are the same.
Can you pls provide more details on the exact issue?
Hi ZhouqyCH,
Thank you for your interest in our work. These models differ in their training mechanism as discussed in the paper. Their architecture definitions are the same.
Can you pls provide more details on the exact issue?
Thank you for the reply! I think I've understood the differences of them. The checkpoint link of the pretrained 'deit_tiny_distilled_patch16_224' is given by the repository of DeiT. So the corresponding model distills knowledge from ImageNet while the DeiT-T-SIN (distilled) distills knowledge, more precisely, the shape knowledge, from SIN. Right?
Yes, that's right.
Hi! First of all, thank the authors for the exciting work! I noticed that the checkpoint link of the pretrained 'deit_tiny_distilled_patch16_224' in vit_models/deit.py is different from the one of the shape-biased model DeiT-T-SIN (distilled), as given in README.md. I thought deit_tiny_distilled_patch16_224 has the same definition with DeiT-T-SIN (distilled). Do they have differences in model architecture or training procedure?