Question about implementing finetuning on iNat-18 dataset

cokezrr commented 2 years ago

Hi, I run following command to implement:

python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --model deit_base_patch16_224 \ --data-set INAT \ --batch-size 96 \ --lr 7.5e-5 \ --opt AdamW \ --weight-decay 0.05 \ --epochs 360 \ --repeated-aug \ --reprob 0.1 \ --drop-path 0.1 \ --data-path /data/Dataset/inat2018_tar \ --finetune ./output/deit_base_patch16_224-b5f2ef4d.pth \ --output_dir ./output/finetune_inat18_deit

Other arguments are the same as the default values in main.py.

But I only got 71% acc within 300 epochs. Should I continue to finetune until 360 epochs?

TouvronHugo commented 2 years ago

Hi @cokezrr , Thanks for your question. The lr is linearly scaled (see here) so it seems that the lr here is not correct. Waiting until the end of the training should further improve performance (standard deviation is much higher on iNaturalist than on ImageNet). Best, Hugo

cokezrr commented 2 years ago

Thanks for your comment!! But I set total batchsize to 96(samples per gpu)*8(gpus)=768 and lr to 7.5e-5 following #105. If there is not correct, how should I modify them?

TouvronHugo commented 2 years ago

With LR 7.5e-5 I meant the lr after scaling you have just to set --lr 5e-5 Best, Hugo

cokezrr commented 2 years ago

Very thanks!! I'll try it!

cokezrr commented 2 years ago

Hi, I have tried by running following command:

python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --model deit_base_patch16_224 \ --data-set INAT \ --batch-size 96 \ --lr 5e-5 \ --opt AdamW \ --weight-decay 0.05 \ --epochs 360 \ --repeated-aug \ --reprob 0.1 \ --drop-path 0.1 \ --data-path /data/Dataset/inat2018_tar \ --finetune ./output/deit_base_patch16_224-b5f2ef4d.pth \ --output_dir ./output/finetune_inat18_deit

But, it still only got 71.63% acc. I don't know where is wrong.

TouvronHugo commented 2 years ago

Hi @cokezrr , Indeed it is odd. In the paper we try fine-tuning with SGD and Adam but the results were similar between Adam and SGD. The 73.2 is obtained in the paper with the following SGD setting: --nproc_per_node=8 --batch 128 --lr 0.01 --epochs 300 --weight-decay 1e-4 --sched cosine --input-size 224 --repeated-aug --smoothing 0.1 --warmup-epochs 5
--warmup-lr 0.0001 --nb-classes 1000
--aa rand-m9-mstd0.5-inc1
--mixup .8
--cutmix 1.0
--opt sgd --reprob 0.0 Let me know if this solves your problem. What can also explain some differences is for example the version of the librairy used. Generally there is a much larger std on iNaturalist than on ImageNet. Best, Hugo

TouvronHugo commented 2 years ago

If it is useful to you, I don't have the original logs anymore but I have the logs of this run which must be very close: (val accuracy during the training) epoch 0: 0.016, epoch 5: 5.224, epoch 10: 21.174, epoch 15:38.999, epoch 20: 50.356, epoch 25: 56.366, epoch 30: 60.489, epoch 35: 62.917, epoch 40: 64.616, epoch 45: 66.163, epoch 50: 67.027, epoch 55: 67.404, epoch 60: 68.296, epoch 65: 68.448, epoch 75: 69.446, epoch 80: 69.475, epoch 85: 69.52, epoch 90: 69.225, epoch 95: 69.778, epoch 100: 70.417, epoch 105: 70.425, epoch 110: 70.151, epoch 115: 69.84, epoch 120: 70.388, epoch 125: 70.102, epoch 130: 70.396, epoch 135: 70.319, epoch 140: 70.364, epoch 145: 70.597, epoch 150: 70.732, epoch 155: 70.838, epoch 160: 70.609, epoch 165: 71.133, epoch 170: 70.626, epoch 175: 71.166, epoch 180: 71.248, epoch 185: 71.964, epoch 190: 71.625, epoch 195: 71.379, epoch 200: 71.702, epoch 205: 72.058, epoch 210: 71.837, epoch 215: 72.243, epoch 220: 71.948, epoch 225: 72.255, epoch 230: 72.218, epoch 235: 72.288, epoch 240: 72.517, epoch 245: 72.3, epoch 250: 72.509, epoch 255: 72.84, epoch 260: 72.566, epoch 265: 72.857, epoch 270: 72.849, epoch 275: 73.029, epoch 280: 73.07, epoch 285: 73.061, epoch 290: 73.188, epoch 295: 73.082, epoch 300: 73.131

facebookresearch / deit

Question about implementing finetuning on iNat-18 dataset #141