facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.35k stars 906 forks source link

Finetuning to down stream tasks #144

Open RoyHirsch opened 3 years ago

RoyHirsch commented 3 years ago

Hi, Thank you for the code release and the continuous support. In the main paper (Sec 4.2.3) you mention FT experiments following the FT protocol in "Training data-efficient image transformers & distillation through attention". Can you please elaborate/release more information about the hyper-parameters used for the FT process? There is ambiguity regarding some of the HP and I would appreciate your insights from FT DINO for downstream tasks.

Thank you

tangtaogo commented 3 years ago

Also looking forward to it. (●’◡’●)ノ

Sara-Ahmed commented 2 years ago

Would be really nice providing these details. I tried to replicate the results on CIFAR-100 using the default parameters in "Training data-efficient image transformers & distillation through attention" but I got 87.79 instead of 90.5

Thanks a lot,

mathildecaron31 commented 2 years ago

Hi

This issue is relevant: https://github.com/facebookresearch/dino/issues/81

cc @TouvronHugo for HP with CIFAR-100 ?

Sara-Ahmed commented 2 years ago

Hello Mathilde, just a reminder about the question regarding the HP with CIFAR100. Thanks a lot

TouvronHugo commented 2 years ago

Hi @Sara-Ahmed , Thanks for your question. The HP for the fine-tuning on CIFAR-100 with ViT-B are the following:

You can find some example of HP used in DeiT for the fine-tuning here: #45

Best,

Hugo

Sara-Ahmed commented 2 years ago

Thanks a lot for you respond. Is it the same for ViT-S? Due to the limited resources, I can only perform experiments on ViT-S Much appreciated,

TouvronHugo commented 2 years ago

Yes, it's the same HP for ViT-S