What parameters should I use to reproduce the 85.0 result for ViT-Base?

yxchng commented 1 year ago

I follow the instruction here

and got 84.7 result for ViT-Base which is quite a bit lower than the 85.0 result reported in the paper.

Can I know what command should I use to reproduce the paper result? Thanks.

OliverRensu commented 1 year ago

Sorry, the batch size should be 128 per GPU with 32GPU or 64 per GPU with 64GPU. I will update this immediately.

yxchng commented 1 year ago

I got similar result 84.72 with 64 per GPU with 64GPU. I check the paper and the parameters there (in the Appendix) are as follows

PRETRAIN:

FINETUNE:

which are different from the parameters used in the README.md in this repo.

I tried the parameters used in the Appendix and got NaN halfway during the pretraining.

So I am a bit confused. Can you kindly clarify?

OliverRensu commented 1 year ago

Is the MAE-Large from the official repo? What is the result of your finetuning with our released pretrained ckpt? Can you share all of your pretraining and finetuning commands?

yxchng commented 1 year ago

MAE large is from official repo.
I will try to finetune from your pretrain checkpoint and report the result later.
Should I use the command in README.md to finetune and pretrain or should I modify and use the parameters in the Appendix (in the screenshot above)? When should I use the parameters in README.md and when should I use the parameters in the Appendix?

yxchng commented 1 year ago

List of commands I use and results I got. Note that I use 4 nodes to finetune instead of 1 but use same total batch size of 1024.

c.f. README.md after your recent update (bs4096) Pretrain:

python -m torch.distributed.launch \
--nnodes 8 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
--batch_size 64 \
--model tinymim_vit_base_patch16 \
--epochs 300 \
--warmup_epochs 15 \
--blr 1.5e-4 --weight_decay 0.05 \
--teacher_path /path/to/teacher_ckpt \
--teacher_model mae_vit_large \
--data_path /path/to/imagenet

Finetune

python -m torch.distributed.launch \
--nnodes 4 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_finetune.py \
--batch_size 32 \
--model vit_base_patch16 \
--finetune ./output_dir/checkpoint-299.pth \
--epochs 100 \
--output_dir ./out_finetune/ \
--blr 5e-4 --layer_decay 0.6 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path /path/to/imagenet

Result 84.72

c.f. README.md before your recent update (bs2048) Pretrain:

python -m torch.distributed.launch \
--nnodes 4 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
--batch_size 64 \
--model tinymim_vit_base_patch16 \
--epochs 300 \
--warmup_epochs 15 \
--blr 1.5e-4 --weight_decay 0.05 \
--teacher_path /path/to/teacher_ckpt \
--teacher_model mae_vit_large \
--data_path /path/to/imagenet

Finetune

python -m torch.distributed.launch \
--nnodes 4 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_finetune.py \
--batch_size 32 \
--model vit_base_patch16 \
--finetune ./output_dir/checkpoint-299.pth \
--epochs 100 \
--output_dir ./out_finetune/ \
--blr 5e-4 --layer_decay 0.6 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path /path/to/imagenet

Result 84.70

using parameters in Appendix (changing blr, min_lr and beta2 in Adam optimizer)

Pretrain:

python -m torch.distributed.launch \
--nnodes 8 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
    --batch_size 64 \
    --model tinymim_vit_base_patch16 \
    --epochs 300 \
    --warmup_epochs 15 \
    --blr 2.4e-3 --min_lr=1e-5
    --beta2=0.999 --weight_decay 0.05 \
    --teacher_path /path/to/teacher_ckpt \
    --teacher_model mae_vit_large \
    --data_path /path/to/imagenet

Result: NaN after 1st epoch.

OliverRensu commented 1 year ago

I can share the finetuning log via email, if you need it, please email me. In the paper, we report peak lr instead of blr.

yxchng commented 1 year ago

Ok. Thanks for clarifying about the peak lr and blr. I notice that the beta2 (0.95 in repo and 0.999 in paper) and min_lr (0 in repo and 1e-5 in paper) are also different. What about these two parameters?

I will email you for the finetuning log.

OliverRensu commented 1 year ago

Please follow all hyper-parameters in this repo. please use the following command to finetune. Thanks~

OliverRensu / TinyMIM

What parameters should I use to reproduce the 85.0 result for ViT-Base? #2