Closed cokezrr closed 2 years ago
Hi @cokezrr , Thanks for your question. The lr is linearly scaled (see here) so it seems that the lr here is not correct. Waiting until the end of the training should further improve performance (standard deviation is much higher on iNaturalist than on ImageNet). Best, Hugo
Thanks for your comment!! But I set total batchsize to 96(samples per gpu)*8(gpus)=768 and lr to 7.5e-5 following #105. If there is not correct, how should I modify them?
With LR 7.5e-5 I meant the lr after scaling you have just to set --lr 5e-5 Best, Hugo
Very thanks!! I'll try it!
Hi, I have tried by running following command:
python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --model deit_base_patch16_224 \ --data-set INAT \ --batch-size 96 \ --lr 5e-5 \ --opt AdamW \ --weight-decay 0.05 \ --epochs 360 \ --repeated-aug \ --reprob 0.1 \ --drop-path 0.1 \ --data-path /data/Dataset/inat2018_tar \ --finetune ./output/deit_base_patch16_224-b5f2ef4d.pth \ --output_dir ./output/finetune_inat18_deit
But, it still only got 71.63% acc. I don't know where is wrong.
Hi @cokezrr ,
Indeed it is odd. In the paper we try fine-tuning with SGD and Adam but the results were similar between Adam and SGD. The 73.2 is obtained in the paper with the following SGD setting:
--nproc_per_node=8
--batch 128
--lr 0.01
--epochs 300
--weight-decay 1e-4
--sched cosine
--input-size 224
--repeated-aug
--smoothing 0.1
--warmup-epochs 5
--warmup-lr 0.0001
--nb-classes 1000
--aa rand-m9-mstd0.5-inc1
--mixup .8
--cutmix 1.0
--opt sgd
--reprob 0.0
Let me know if this solves your problem.
What can also explain some differences is for example the version of the librairy used. Generally there is a much larger std on iNaturalist than on ImageNet.
Best,
Hugo
If it is useful to you, I don't have the original logs anymore but I have the logs of this run which must be very close: (val accuracy during the training) epoch 0: 0.016, epoch 5: 5.224, epoch 10: 21.174, epoch 15:38.999, epoch 20: 50.356, epoch 25: 56.366, epoch 30: 60.489, epoch 35: 62.917, epoch 40: 64.616, epoch 45: 66.163, epoch 50: 67.027, epoch 55: 67.404, epoch 60: 68.296, epoch 65: 68.448, epoch 75: 69.446, epoch 80: 69.475, epoch 85: 69.52, epoch 90: 69.225, epoch 95: 69.778, epoch 100: 70.417, epoch 105: 70.425, epoch 110: 70.151, epoch 115: 69.84, epoch 120: 70.388, epoch 125: 70.102, epoch 130: 70.396, epoch 135: 70.319, epoch 140: 70.364, epoch 145: 70.597, epoch 150: 70.732, epoch 155: 70.838, epoch 160: 70.609, epoch 165: 71.133, epoch 170: 70.626, epoch 175: 71.166, epoch 180: 71.248, epoch 185: 71.964, epoch 190: 71.625, epoch 195: 71.379, epoch 200: 71.702, epoch 205: 72.058, epoch 210: 71.837, epoch 215: 72.243, epoch 220: 71.948, epoch 225: 72.255, epoch 230: 72.218, epoch 235: 72.288, epoch 240: 72.517, epoch 245: 72.3, epoch 250: 72.509, epoch 255: 72.84, epoch 260: 72.566, epoch 265: 72.857, epoch 270: 72.849, epoch 275: 73.029, epoch 280: 73.07, epoch 285: 73.061, epoch 290: 73.188, epoch 295: 73.082, epoch 300: 73.131
Hi, I run following command to implement:
python -m torch.distributed.launch \ --nproc_per_node=8 \ --use_env main.py \ --model deit_base_patch16_224 \ --data-set INAT \ --batch-size 96 \ --lr 7.5e-5 \ --opt AdamW \ --weight-decay 0.05 \ --epochs 360 \ --repeated-aug \ --reprob 0.1 \ --drop-path 0.1 \ --data-path /data/Dataset/inat2018_tar \ --finetune ./output/deit_base_patch16_224-b5f2ef4d.pth \ --output_dir ./output/finetune_inat18_deit
Other arguments are the same as the default values in main.py.
But I only got 71% acc within 300 epochs. Should I continue to finetune until 360 epochs?