Closed peihaowang closed 3 years ago
Hi @peihaowang , Thanks for your message, In the CaiT paper we use hard distillation without distillation head in order to simplify the distillation process. So, it is sufficient to apply the hard distillation loss directly on the output of the model. At this time, we do not have plans to provide weights without distillation. Best, Hugo
Hi authors, thanks for sharing this amazing repo. I hope to train CaiT with distillation, however, I found this part of code seems missing.
I have referred to the paper and gone through
cait_models.py
, it seems current CaiT doesn't own a distillation head. I'm curious how CaiT was trained with DeiT's distillation scheme? And I'm wondering if pre-trained CaiT models without distillation can be provided.Many thanks in advance.