Closed wangpichao closed 3 years ago
Hi @wangpichao,
For the pre-training you only need to use the training code of DeiT and adapting the --drop-path
rate as explained in the paper Going deeper with Image Transformers. For fine-tuning you have to do fine-tuning with distillation with the following hparams:
Could you kindly please give the training scripts for training CaiT-M48 distilled 448? Thanks.