Alibaba-MIIL / TResNet

Official Pytorch Implementation of "TResNet: High-Performance GPU-Dedicated Architecture" (WACV 2021)
Apache License 2.0
471 stars 63 forks source link

CIFAR100 Finetuning result reproduce #52

Closed maulikmadhavi closed 1 year ago

maulikmadhavi commented 1 year ago

Hi Tal Ridnik, Thanks for sharing nice work, I am trying to reproduce the finetuning result using timm. My accuracy is 89.25 which is lesser than reported in paper (i.e., 91.5).

python train.py \
    /cifar100_imgs/ \
    --train-split train \
    --val-split val \
    --model tresnet_xl \
    --output cifar100_tresnet_xl \
    --pretrained \
    --batch-size 64 \
    --lr 4e-5 \
    --epochs 150 \
    --weight-decay 0.0001 \
    --sched cosine \
    --smoothing 0.1 \
    --warmup-epochs 5 \
    --aa rand-m9-mstd0.5-inc1 \
    --mixup .8 \
    --cutmix 1.0 \
    --remode pixel \
    --reprob 0.25 \
    --opt adamw \
    --amp

Do you have any suggestion to follow? Thanks, Maulik!

mrT23 commented 1 year ago

read this issue for better examples: https://github.com/Alibaba-MIIL/ImageNet21K/issues/40

something like this should work better: python -m torch.distributed.launch --nproc_per_node=8 --master_port 6016 train.py \ /data/cifar-100-images/ \ -b=64 \ --img-size=224 \ --epochs=50 \ --color-jitter=0 \ --amp \ --lr=2e-4 \ --sched='cosine' \ --model-ema --model-ema-decay=0.995 --reprob=0.5 --smoothing=0.1 \ --min-lr=1e-8 --warmup-epochs=3 --train-interpolation=bilinear --aa=v0 \ --model=vit_base_patch16_224_miil_in21k \ --pretrained \ --num-classes=100 \ --opt=adamw --weight-decay=1e-4 \ --checkpoint-hist=1 --batch_size=128 --optimizer=sgd --learning_rate=1e-2 --wd=1e-4

play a bit with the learning rate vs batch size optimization

maulikmadhavi commented 1 year ago

Thanks! The above hyperparameters gave a validation accuracy of 87.89% which is even worse *** Best metric: 87.89 (epoch 30)

mrT23 commented 1 year ago

Similar Issues: