This is a PyTorch implementation of the paper Mobile-Former: Bridging MobileNet and Transformer:
@Article{MobileFormer2021,
author = {Chen, Yinpeng and Dai, Xiyang and Chen, Dongdong and Liu, Mengchen and Dong, Xiaoyi and Yuan, Lu and Liu, Zicheng},
journal = {arXiv:2108.05895},
title = {Mobile-Former: Bridging MobileNet and Transformer},
year = {2021},
}
timm==0.3.4
.model | Input | Param | FLOPs | Top-1 | Pretrained |
---|---|---|---|---|---|
mobile-former-508m | 224 | 14.0M | 508M | 79.3 | download |
mobile-former-294m | 224 | 11.4M | 294M | 77.9 | download |
mobile-former-214m | 224 | 9.4M | 214M | 76.7 | download |
mobile-former-151m | 224 | 7.6M | 151M | 75.2 | download |
mobile-former-96m | 224 | 4.6M | 96M | 72.8 | download |
mobile-former-52m | 224 | 3.5M | 52M | 68.7 | download |
mobile-former-26m | 224 | 3.2M | 26M | 64.0 | download |
To train mobile-former-508m, run the following on 1 node with 8 GPUs:
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-508m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.001 \
--weight-decay 0.20 \
--drop 0.3 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-294m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-294m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.001 \
--weight-decay 0.20 \
--drop 0.3 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-214m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-214m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0009 \
--weight-decay 0.15 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-151m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-151m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0009 \
--weight-decay 0.10 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.2 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-96m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-96m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0008 \
--weight-decay 0.10 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.0 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.0 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-52m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-52m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0008 \
--weight-decay 0.10 \
--drop 0.2 \
--drop-path 0.0 \
--mixup 0.2 \
--remode pixel \
--reprob 0.0 \
--color-jitter 0. \
--log-interval 200 \
mobile-former-26m
python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
--output $OUTPUT_PATH1 \
--model mobile-former-26m \
-j 8 \
--batch-size 128 \
--epochs 450 \
--opt adamw \
--sched cosine \
--lr 0.0008 \
--weight-decay 0.08 \
--drop 0.1 \
--drop-path 0.0 \
--mixup 0.2 \
--aa rand-m9-mstd0.5 \
--remode pixel \
--reprob 0.0 \
--color-jitter 0. \
--log-interval 200 \