Haochen-Wang409 / DropPos

[NeurIPS'23] DropPos: Pre-Training Vision Transformers by Reconstructing Dropped Positions
Apache License 2.0
59 stars 3 forks source link

pre-trained and fine-tuned models #6

Open heyleadro opened 1 month ago

heyleadro commented 1 month ago

Hi, I am willing to download the models you've uploaded recently. Do you consider uploading them on something like google drive or dropbox ? Or is there a way to download them via link you provided without registration and installing baidu? Thanks

Haochen-Wang409 commented 1 month ago

Due to limited storage, we could only upload ViT-B checkpoints using google drive. Here is the link: https://drive.google.com/drive/folders/1gDsfYoCllMHa7sYXEsrdOKk74WoT-XQR?usp=drive_link

heyleadro commented 1 month ago

Hi, thanks for releasing them. Is DropPos_pretrain_vit_base_patch16.pth trained for 800 epochs or 200 epochs?

Haochen-Wang409 commented 1 month ago

It is trained for 800 epochs.

heyleadro commented 1 month ago

Are you also fine-tuning 800 epochs model for 100 epochs? I can only achieve ~83.2% with this setup

--batch_size 1024 \
    --accum_iter 1 \
    --model vit_base_patch16 \
    --finetune DropPos_pretrain_vit_base_patch16.pth \
    \
    --epochs 100 \
    --warmup_epochs 5 \
    --blr 1e-3 --layer_decay 0.8 --weight_decay 0.05 \
    --drop_path 0.1 --reprob 0.25 --mixup 0.8 --cutmix 1.0 \
    --dist_eval \
    --data_path /path \
    --nb_classes 1000 \

But If I train DropPos_mae_vit_base_patch16_dec512d2b for 200 epochs and fine-tune for 100 I get 82.91% with the same setup, so it this really close to the paper. What could be the problem?