facebookresearch / dino

PyTorch code for Vision Transformers training with the Self-Supervised learning method DINO
Apache License 2.0
6.2k stars 901 forks source link

DINO fine-tune hyper-parameters #171

Open srikar2097 opened 2 years ago

srikar2097 commented 2 years ago

Hi @mathildecaron31 and @TouvronHugo, thank you for sharing the code to your inspirational work. I am trying out fine-tuning experiments using DINO (teacher) on following downstream datasets: CIFAR10, CIFAR100, INat18, INat19, Flowers, Cars, ImageNet and am unable to replicate the results.

I have seen issue 81 and issue 144 both of which seem related to CIFAR 10. What about other datasets? I tried the mentioned recipe for CIFAR 10 on INat18 and INat19 and was not able to replicate the results (iNat 18: got 71.02 vs reported 72.6 and iNat19: 76.1 vs 78.6).

Screen Shot 2022-01-19 at 9 35 46 PM

I trained this using latest DINO code and pytorch 1.10.0+cu113.

Can you please resolve the mystery of downstream datasets fine-tuning hyper-parameters?

TouvronHugo commented 2 years ago

Hi @srikar2097, Thanks for your question. It is normal the finetuning configuration for CIFAR is different from the one used for iNaturalist. The hyper-parameters for iNaturalist are: --batch-size 128 (for each gpu) --lr 5e-5 (lr before scaling) --epochs 300 --weight-decay 0.05
--sched cosine
--input-size 224 --repeated-aug --smoothing 0.1 --warmup-epochs 5 --aa rand-m9-mstd0.5-inc1 --mixup .8
--cutmix 1.0
--remode pixel --reprob 0.25 --drop-path 0.1 --opt adamw --warmup-lr 1e-6 This config is for 8 gpus. It should be taken into account that the version of the librairy used can have a slight impact on the performance and that there is a more important std on iNat than on ImageNet. Best, Hugo

srikar2097 commented 2 years ago

Hi @TouvronHugo, much appreciated for sharing the recipe for iNat. Assuming this is used for both INat18, INat19. Two follow up questions:

  1. When you say "there is a more important std on iNat than on ImageNet" you mean multiple runs on val data gives performance numbers with large variations?
  2. What fine-tuning recipe's were used for other datasets? CIFAR100, Flowers, Cars, ImageNet (we covered CIFAR10 and INat18, INat19)

thank you again!

aquachieh commented 1 year ago

Hi @TouvronHugo and @mathildecaron31 , Thanks for your good work! I want to replicate the paper results(Table6) on CIFAR-10(ViT-S/16), but my top1 accuracy is only 88. I wonder if I need to set a pretrained model (or full ckpt) when training vit on the CIFAR10 dataset? https://github.com/facebookresearch/dino#pretrained-models I also want to confirm that the correct way to put it is to rename it to "checkpoint.pth" and put it in output_dir? thank you!