Closed LiuDongyang6 closed 1 year ago
To ensure a fair and accurate comparison with the DeiT model, we have implemented an identical architecture to the DeiT during the fine-tuning process on the ImageNet dataset. Specifically, we have set the number of attention heads to 3 within the "models_vit.py" file, which is utilized during fine-tuning.
When working with the "models_tinymim" file, there is more flexibility in terms of the number of attention heads that can be utilized. While setting the number of attention heads to either 3 or 6 may result in slight differences in performance.
Hi, thanks for this work!
I notice that the attention head number for vit-tiny is set to 6 in models_tinymim.py, but in most existing works it is set to 3, and actually I also found it to be 3 in your models_vit.py, which means the head number is different during pretraining and finetuning. Why is this the case?