Mobile-Former: Pytorch Implementation

This is a PyTorch implementation of the paper Mobile-Former: Bridging MobileNet and Transformer:

@Article{MobileFormer2021,
  author  = {Chen, Yinpeng and Dai, Xiyang and Chen, Dongdong and Liu, Mengchen and Dong, Xiaoyi and Yuan, Lu and Liu, Zicheng},
  journal = {arXiv:2108.05895},
  title   = {Mobile-Former: Bridging MobileNet and Transformer},
  year    = {2021},
}

This repo is based on timm==0.3.4.

model	Input	Param	FLOPs	Top-1	Pretrained
mobile-former-508m	224	14.0M	508M	79.3	download
mobile-former-294m	224	11.4M	294M	77.9	download
mobile-former-214m	224	9.4M	214M	76.7	download
mobile-former-151m	224	7.6M	151M	75.2	download
mobile-former-96m	224	4.6M	96M	72.8	download
mobile-former-52m	224	3.5M	52M	68.7	download
mobile-former-26m	224	3.2M	26M	64.0	download

Mobile-Former ImageNet Training

To train mobile-former-508m, run the following on 1 node with 8 GPUs:

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-508m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.001 \
    --weight-decay 0.20 \
    --drop 0.3 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-294m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-294m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.001 \
    --weight-decay 0.20 \
    --drop 0.3 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-214m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-214m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0009 \
    --weight-decay 0.15 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-151m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-151m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0009 \
    --weight-decay 0.10 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.2 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-96m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-96m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0008 \
    --weight-decay 0.10 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.0 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.0 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-52m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-52m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0008 \
    --weight-decay 0.10 \
    --drop 0.2 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --remode pixel \
    --reprob 0.0 \
    --color-jitter 0. \
    --log-interval 200 \

mobile-former-26m

python3 -m torch.distributed.launch --nproc_per_node=8 train.py $DATA_PATH \
    --output $OUTPUT_PATH1 \
    --model mobile-former-26m \
    -j 8 \
    --batch-size 128 \
    --epochs 450 \
    --opt adamw \
    --sched cosine \
    --lr 0.0008 \
    --weight-decay 0.08 \
    --drop 0.1 \
    --drop-path 0.0 \
    --mixup 0.2 \
    --aa rand-m9-mstd0.5 \
    --remode pixel \
    --reprob 0.0 \
    --color-jitter 0. \
    --log-interval 200 \

AAboys / MobileFormer

readme

Mobile-Former: Pytorch Implementation

Mobile-Former ImageNet Training