How to train CaiT with distillation?

facebookresearch / deit

Official DeiT repository

Apache License 2.0

4.07k stars 556 forks source link

How to train CaiT with distillation? #119

Closed peihaowang closed 3 years ago

peihaowang commented 3 years ago

Hi authors, thanks for sharing this amazing repo. I hope to train CaiT with distillation, however, I found this part of code seems missing.

I have referred to the paper and gone through cait_models.py, it seems current CaiT doesn't own a distillation head. I'm curious how CaiT was trained with DeiT's distillation scheme? And I'm wondering if pre-trained CaiT models without distillation can be provided.

Many thanks in advance.

TouvronHugo commented 3 years ago

Hi @peihaowang , Thanks for your message, In the CaiT paper we use hard distillation without distillation head in order to simplify the distillation process. So, it is sufficient to apply the hard distillation loss directly on the output of the model. At this time, we do not have plans to provide weights without distillation. Best, Hugo