Closed yxchng closed 1 year ago
Sorry, the batch size should be 128 per GPU with 32GPU or 64 per GPU with 64GPU. I will update this immediately.
I got similar result 84.72 with 64 per GPU with 64GPU. I check the paper and the parameters there (in the Appendix) are as follows
PRETRAIN:
FINETUNE:
which are different from the parameters used in the README.md in this repo.
I tried the parameters used in the Appendix and got NaN halfway during the pretraining.
So I am a bit confused. Can you kindly clarify?
Is the MAE-Large from the official repo? What is the result of your finetuning with our released pretrained ckpt? Can you share all of your pretraining and finetuning commands?
List of commands I use and results I got. Note that I use 4 nodes to finetune instead of 1 but use same total batch size of 1024.
c.f. README.md after your recent update (bs4096) Pretrain:
python -m torch.distributed.launch \
--nnodes 8 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
--batch_size 64 \
--model tinymim_vit_base_patch16 \
--epochs 300 \
--warmup_epochs 15 \
--blr 1.5e-4 --weight_decay 0.05 \
--teacher_path /path/to/teacher_ckpt \
--teacher_model mae_vit_large \
--data_path /path/to/imagenet
Finetune
python -m torch.distributed.launch \
--nnodes 4 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_finetune.py \
--batch_size 32 \
--model vit_base_patch16 \
--finetune ./output_dir/checkpoint-299.pth \
--epochs 100 \
--output_dir ./out_finetune/ \
--blr 5e-4 --layer_decay 0.6 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path /path/to/imagenet
Result 84.72
c.f. README.md before your recent update (bs2048) Pretrain:
python -m torch.distributed.launch \
--nnodes 4 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
--batch_size 64 \
--model tinymim_vit_base_patch16 \
--epochs 300 \
--warmup_epochs 15 \
--blr 1.5e-4 --weight_decay 0.05 \
--teacher_path /path/to/teacher_ckpt \
--teacher_model mae_vit_large \
--data_path /path/to/imagenet
Finetune
python -m torch.distributed.launch \
--nnodes 4 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_finetune.py \
--batch_size 32 \
--model vit_base_patch16 \
--finetune ./output_dir/checkpoint-299.pth \
--epochs 100 \
--output_dir ./out_finetune/ \
--blr 5e-4 --layer_decay 0.6 \
--weight_decay 0.05 --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
--dist_eval --data_path /path/to/imagenet
Result 84.70
using parameters in Appendix (changing blr, min_lr and beta2 in Adam optimizer)
Pretrain:
python -m torch.distributed.launch \
--nnodes 8 --node_rank $noderank \
--nproc_per_node 8 --master_addr $ip --master_port $port \
main_pretrain.py \
--batch_size 64 \
--model tinymim_vit_base_patch16 \
--epochs 300 \
--warmup_epochs 15 \
--blr 2.4e-3 --min_lr=1e-5
--beta2=0.999 --weight_decay 0.05 \
--teacher_path /path/to/teacher_ckpt \
--teacher_model mae_vit_large \
--data_path /path/to/imagenet
Result: NaN after 1st epoch.
I can share the finetuning log via email, if you need it, please email me. In the paper, we report peak lr instead of blr.
Ok. Thanks for clarifying about the peak lr and blr. I notice that the beta2 (0.95 in repo and 0.999 in paper) and min_lr (0 in repo and 1e-5 in paper) are also different. What about these two parameters?
I will email you for the finetuning log.
Please follow all hyper-parameters in this repo. please use the following command to finetune. Thanks~
I follow the instruction here
and got 84.7 result for ViT-Base which is quite a bit lower than the 85.0 result reported in the paper.
Can I know what command should I use to reproduce the paper result? Thanks.