This is a model zoo project under Pytorch. In this repo I will implement some of basic classification models which have good performance on ImageNet. Then I will train them in most fair way as possible and try my best to get SOTA model on ImageNet. In this repo I'll only consider FP16.
If you found any IO bottleneck please use LMDB format dataset. A good way is try both and find out which is more faster.
I provide conversion script here.
python distribute_train_script --params
Here is a example
python distribute_train_script.py --data-path /s4/piston/ImageNet --batch-size 256 --dtype float16 \
-j 48 --epochs 360 --lr 2.6 --warmup-epochs 5 --label-smoothing \
--no-wd --wd 0.00003 --model GhostNet --log-interval 150 --model-info \
--dist-url tcp://127.0.0.1:26548 --world-size 1 --rank 0
model | epochs | dtype | batch size* | gpus | lr | tricks | Params(M)/FLOPs | top1/top5 | params/logs |
---|---|---|---|---|---|---|---|---|---|
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | - | 25.6/4.1G | 77.36/- | Google Drive |
resnet101 | 120 | FP16 | 128 | 8 | 0.4 | - | 44.7/7.8G | 79.13/94.38 | Google Drive |
resnet50v2 | 120 | FP16 | 128 | 8 | 0.4 | - | 25.6/4.1G | 77.06/93.44 | Google Drive |
resnet101v2 | 120 | FP16 | 128 | 8 | 0.4 | - | 44.6/7.8G | 78.90/94.39 | Google Drive |
ResNext50_32x4d | 120 | FP16 | 128 | 8 | 0.4 | - | 25.1/4.2G | 79.00/94.39 | |
RegNetX4_0GF | 120 | FP16 | 128 | 8 | 0.4 | - | 22.2/4.0G | 78.40/94.04 | |
RegNetY4_0GF | 120 | FP16 | 128 | 8 | 0.4 | - | 22.1/4.0G | 79.22/94.57 | |
RegNetY6_4GF | 120 | FP16 | 128 | 8 | 0.4 | - | 31.2/6.4G | 79.69/94.82 | |
ResNeST50 | 120 | FP16 | 128 | 8 | 0.4 | - | 27.5/4.1G | 78.62/94.28 | |
mobilenetv1 | 150 | FP16 | 256 | 8 | 0.4 | - | 4.3/572.2M | 72.17/90.70 | Google Drive |
mobilenetv2 | 150 | FP16 | 256 | 8 | 0.4 | - | 3.5/305.3M | 71.94/90.59 | Google Drive |
mobilenetv3 Large | 360 | FP16 | 256 | 8 | 2.6 | Label smoothing No decay bias Dropout | 5.5/219M | 75.64/92.61 | Google Drive |
mobilenetv3 Small | 360 | FP16 | 256 | 8 | 2.6 | Label smoothing No decay bias Dropout | 3.0/57.8M | 67.83/87.78 | |
GhostNet1.3 | 360 | FP16 | 400 | 8 | 2.6 | Label smoothing No decay bias Dropout | 7.4/230.4M | 75.78/92.77 | Google Drive |
Here are lots of tricks to improve accuracy during this years.(If you have another idea please open an issue.) I want to verify them in a fair way.
Tricks: RandomRotation, OctConv[14], Drop out, Label Smoothing[4], Sync BN, SwitchNorm[6], Mixup[17], no decay bias[7],
Cutout[5], Relu6[18], swish activation[10], Stochastic Depth[9], Lookahead Optimizer[11], Pre-active(ResnetV2)[12],
DCNv2[13], LIP[16].
Special: Zero-initialize the last BN, just call it 'Zero γ', only for post-active model.
I'll only use 120 epochs and 128*8 batch size to train them. I know some tricks may need train more time or larger batch size but it's not fair for others. You can think of it as a performance in the current situation.
model | epochs | dtype | batch size* | gpus | lr | tricks | degree | top1/top5 | improve | params/logs |
---|---|---|---|---|---|---|---|---|---|---|
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | - | - | 77.36/- | baseline | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Label smoothing | smoothing=0.1 | 77.78/93.80 | +0.42 | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | No decay bias | - | 77.28/93.61 | -0.08 | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Sync BN | - | 77.31/93.49 | -0.05 | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Mixup | alpha=0.2 | 77.49/93.73 | +0.13 | missing |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | RandomRotation | degree=15 | 76.64/93.28 | -1.15 | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Cutout | read code | 77.44/93.62 | +0.08 | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Dropout | rate=0.3 | 77.11/93.58 | -0.25 | Google Drive |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Lookahead-SGD | - | 77.23/93.39 | -0.13 | Google Drive |
resnet50v2 | 120 | FP16 | 128 | 8 | 0.4 | pre-active | - | 77.06/93.44 | -0.30 | Google Drive |
oct_resnet50 | 120 | FP16 | 128 | 8 | 0.4 | OctConv | alpha=0.125 | - | - | |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Relu6 | 77.28/93.5 | -0.08 | Google Drive | |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | - | 77.00/- | DDP baseline | ||
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Gradient Centralization | Conv only | 77.40/93.57 | +0.40 | |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | Zero γ | 77.24/- | +0.24 | ||
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | No decay bias | 77.74/93.77 | +0.74 | ||
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | RandAugment | n=2,m=9 | 76.44/93.18 | -0.96 | |
resnet50 | 120 | FP16 | 128 | 8 | 0.4 | AutoAugment | 76.50/93.23 | -0.50 |
Mixup
, Cutout
, Dropout
may get better results.@misc{ModelZoo.pytorch,
title = {Basic deep conv neural network reproduce and explore},
author = {X.Yang},
URL = {https://github.com/PistonY/ModelZoo.pytorch},
year = {2019}
}