EMA-attention-module
Results
Training on CIFAR-100 with ResNet for 200 epochs.
| ResNet101 | 32 | 42.70M | 77.78 | 94.39 | - |
| + CA | 32 | 46.22M | 80.01 | 94.78 | - |
| + EMA | 32 | 42.96M | 80.86 | 95.75 | - |
| + SSA-32 | 32 | 51.37M | 80.61 | 95.26 | - |
Training on ImageNet-1k with MobileNetv2 for 400 epochs.
- Train
./distributed_train.sh 2 ./ILSVRC2012/ --model mobilenetv2_100 -b 256 --sched cosine --epochs 400 --decay-epochs 2.4 --decay-rate .97 --opt-eps .001 -j 16 --weight-decay 1e-5 --drop 0.2 --drop-path 0.2 --model-ema --model-ema-decay 0.9999 --aa rand-m9-mstd0.5 --remode pixel --amp --lr 0.4 --warmup-epochs 5
- Val
python validate.py ./ILSVRC2012/ --model mobilenetv2_100 --checkpoint model_best.pth.tar --use-ema
Name |
Resolution |
#Params |
MFLOPs |
Top-1 Acc. |
Top-5 Acc. |
BaiduDrive(models) |
MobileNetv2 |
224 |
3.50M |
300 |
72.3 |
91.02 |
+ SE |
224 |
3.89M |
300 |
73.5 |
- |
- |
+ CBAM |
224 |
3.89M |
300 |
73.6 |
- |
- |
+ CA |
224 |
3.95M |
310 |
74.3 |
- |
- |
+ EMA |
224 |
3.55M |
306 |
74.32 |
91.82 |
ema |
Training on ImageNet-1k with MobileNetv2 for 200 epochs.
Name |
Resolution |
#Params |
MFLOPs |
Top-1 Acc. |
Top-5 Acc. |
BaiduDrive(models) |
MobileNetv2 |
224 |
3.504M |
300.79 |
72.192 |
90.534 |
- |
+ EMA |
224 |
- |
302 |
72.55 |
90.89 |
ema |
Training on COCO 2017 with YOLOv5s for 300 epochs.
Name |
Resolution |
#Params |
MFLOPs |
mAP@.5 |
mAP@.5:.95 |
BaiduDrive(models) |
YOLOv5s |
640 |
7.23M |
16.5 |
56.0 |
37.2 |
yolov5s(v6.0) |
+ CBAM |
640 |
7.27M |
16.6 |
57.1 |
37.7 |
cbam |
+ SA |
640 |
7.23M |
16.5 |
56.8 |
37.4 |
sa |
+ ECA |
640 |
7.23M |
16.5 |
57.1 |
37.6 |
eca |
+ CA |
640 |
7.26M |
16.50 |
57.5 |
38.1 |
ca |
+ EMA |
640 |
7.24M |
16.53 |
57.8 |
38.4 |
ema |
+ SSA-32 |
640 |
7.27M |
0 |
58.7 |
38.4 |
|
+ SSA-16 |
640 |
7.31M |
0 |
58.1 |
38.5 |
|
+ SSA-2 |
640 |
8.55M |
0 |
58.3 |
38.8 |
|
+ SSA-1 |
640 |
11.50M |
0 |
58.8 |
39.1 |
|
Training on VisDrone 2019 with YOLOv5x.
-
Train
python train.py --data VisDrone.yaml --weights yolov5x.pt --cfg models/accModels/yolov5xP2CBAM.yaml --epochs 300 --batch-size 6 --img 640 --device 0
- Val
python val.py --data VisDrone.yaml --img 640 --weights best.pt
Name |
Resolution |
#Params |
MFLOPs |
mAP@.5 |
mAP@.5:.95 |
BaiduDrive(models) |
YOLOv5x (v6.0) |
640 |
90.96M |
314.2 |
49.29 |
30.0 |
- |
+ CBAM |
640 |
91.31M |
315.1 |
49.40 |
30.1 |
- |
+ CA |
640 |
91.28M |
315.2 |
49.30 |
30.1 |
- |
+ EMA |
640 |
91.18M |
315.0 |
49.70 |
30.4 |
ema |
+ SSA-32 |
640 |
91.18M |
315.8 |
49.80 |
30.7 |
|
References
@INPROCEEDINGS{10096516,
author={Ouyang, Daliang and He, Su and Zhang, Guozhong and Luo, Mingzhu and Guo, Huaiyong and Zhan, Jian and Huang, Zhijie},
booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Efficient Multi-Scale Attention Module with Cross-Spatial Learning},
year={2023},
pages={1-5},
doi={10.1109/ICASSP49357.2023.10096516}}