Number of attention heads for vit-tiny

To ensure a fair and accurate comparison with the DeiT model, we have implemented an identical architecture to the DeiT during the fine-tuning process on the ImageNet dataset. Specifically, we have set the number of attention heads to 3 within the "models_vit.py" file, which is utilized during fine-tuning.

When working with the "models_tinymim" file, there is more flexibility in terms of the number of attention heads that can be utilized. While setting the number of attention heads to either 3 or 6 may result in slight differences in performance.

OliverRensu / TinyMIM

Number of attention heads for vit-tiny #4