Open RoyHirsch opened 3 years ago
Also looking forward to it. (●’◡’●)ノ
Would be really nice providing these details. I tried to replicate the results on CIFAR-100 using the default parameters in "Training data-efficient image transformers & distillation through attention" but I got 87.79 instead of 90.5
Thanks a lot,
Hi
This issue is relevant: https://github.com/facebookresearch/dino/issues/81
cc @TouvronHugo for HP with CIFAR-100 ?
Hello Mathilde, just a reminder about the question regarding the HP with CIFAR100. Thanks a lot
Hi @Sara-Ahmed , Thanks for your question. The HP for the fine-tuning on CIFAR-100 with ViT-B are the following:
You can find some example of HP used in DeiT for the fine-tuning here: #45
Best,
Hugo
Thanks a lot for you respond. Is it the same for ViT-S? Due to the limited resources, I can only perform experiments on ViT-S Much appreciated,
Yes, it's the same HP for ViT-S
Hi, Thank you for the code release and the continuous support. In the main paper (Sec 4.2.3) you mention FT experiments following the FT protocol in "Training data-efficient image transformers & distillation through attention". Can you please elaborate/release more information about the hyper-parameters used for the FT process? There is ambiguity regarding some of the HP and I would appreciate your insights from FT DINO for downstream tasks.
Thank you