OliverRensu / TinyMIM

151 stars 7 forks source link

What parameters should I use to reproduce the 85.0 result for ViT-Base? #2

Closed yxchng closed 1 year ago

yxchng commented 1 year ago

I follow the instruction here

Screenshot 2023-01-09 at 10 08 27 AM

and got 84.7 result for ViT-Base which is quite a bit lower than the 85.0 result reported in the paper.

Can I know what command should I use to reproduce the paper result? Thanks.

OliverRensu commented 1 year ago

Sorry, the batch size should be 128 per GPU with 32GPU or 64 per GPU with 64GPU. I will update this immediately.

yxchng commented 1 year ago

I got similar result 84.72 with 64 per GPU with 64GPU. I check the paper and the parameters there (in the Appendix) are as follows

PRETRAIN:

Screenshot 2023-01-09 at 10 04 26 AM

FINETUNE:

Screenshot 2023-01-09 at 10 04 53 AM

which are different from the parameters used in the README.md in this repo.

I tried the parameters used in the Appendix and got NaN halfway during the pretraining.

So I am a bit confused. Can you kindly clarify?

OliverRensu commented 1 year ago

Is the MAE-Large from the official repo? What is the result of your finetuning with our released pretrained ckpt? Can you share all of your pretraining and finetuning commands?

yxchng commented 1 year ago
  1. MAE large is from official repo.
  2. I will try to finetune from your pretrain checkpoint and report the result later.
  3. Should I use the command in README.md to finetune and pretrain or should I modify and use the parameters in the Appendix (in the screenshot above)? When should I use the parameters in README.md and when should I use the parameters in the Appendix?
yxchng commented 1 year ago

List of commands I use and results I got. Note that I use 4 nodes to finetune instead of 1 but use same total batch size of 1024.

  1. c.f. README.md after your recent update (bs4096) Pretrain:

    python -m torch.distributed.launch \
    --nnodes 8 --node_rank $noderank \
    --nproc_per_node 8 --master_addr $ip --master_port $port \
    main_pretrain.py \
    --batch_size 64 \
    --model tinymim_vit_base_patch16 \
    --epochs 300 \
    --warmup_epochs 15 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --teacher_path /path/to/teacher_ckpt \
    --teacher_model mae_vit_large \
    --data_path /path/to/imagenet 

    Finetune

    python -m torch.distributed.launch \
    --nnodes 4 --node_rank $noderank \
    --nproc_per_node 8 --master_addr $ip --master_port $port \
    main_finetune.py \
    --batch_size 32 \
    --model vit_base_patch16 \
    --finetune ./output_dir/checkpoint-299.pth \
    --epochs 100 \
    --output_dir ./out_finetune/ \
    --blr 5e-4 --layer_decay 0.6 \
    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path /path/to/imagenet

    Result 84.72

  2. c.f. README.md before your recent update (bs2048) Pretrain:

    python -m torch.distributed.launch \
    --nnodes 4 --node_rank $noderank \
    --nproc_per_node 8 --master_addr $ip --master_port $port \
    main_pretrain.py \
    --batch_size 64 \
    --model tinymim_vit_base_patch16 \
    --epochs 300 \
    --warmup_epochs 15 \
    --blr 1.5e-4 --weight_decay 0.05 \
    --teacher_path /path/to/teacher_ckpt \
    --teacher_model mae_vit_large \
    --data_path /path/to/imagenet 

    Finetune

    python -m torch.distributed.launch \
    --nnodes 4 --node_rank $noderank \
    --nproc_per_node 8 --master_addr $ip --master_port $port \
    main_finetune.py \
    --batch_size 32 \
    --model vit_base_patch16 \
    --finetune ./output_dir/checkpoint-299.pth \
    --epochs 100 \
    --output_dir ./out_finetune/ \
    --blr 5e-4 --layer_decay 0.6 \
    --weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval --data_path /path/to/imagenet

    Result 84.70

  3. using parameters in Appendix (changing blr, min_lr and beta2 in Adam optimizer)

Pretrain:

python -m torch.distributed.launch \
--nnodes 8 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
    --batch_size 64 \
    --model tinymim_vit_base_patch16 \
    --epochs 300 \
    --warmup_epochs 15 \
    --blr 2.4e-3 --min_lr=1e-5
    --beta2=0.999 --weight_decay 0.05 \
    --teacher_path /path/to/teacher_ckpt \
    --teacher_model mae_vit_large \
    --data_path /path/to/imagenet 

Result: NaN after 1st epoch.

OliverRensu commented 1 year ago

I can share the finetuning log via email, if you need it, please email me. In the paper, we report peak lr instead of blr.

yxchng commented 1 year ago

Ok. Thanks for clarifying about the peak lr and blr. I notice that the beta2 (0.95 in repo and 0.999 in paper) and min_lr (0 in repo and 1e-5 in paper) are also different. What about these two parameters?

I will email you for the finetuning log.

OliverRensu commented 1 year ago

Please follow all hyper-parameters in this repo. please use the following command to finetune. image Thanks~