By Songtao Liu, Di Huang, Yunhong Wang
In this work, we propose a novel and data driven strategy for pyramidal feature fusion, referred to as adaptively spatial feature fusion (ASFF). It learns the way to spatially filter conflictive information to suppress the inconsistency, thus improving the scale-invariance of features, and introduces nearly free inference overhead. For more details, please refer to our arXiv paper.
YOLOX is here!, come and use the stronger YOLO!
Add MobileNet V2!
Add a demo.py file
Faster NMS (adopt official implementation)
System | test-dev mAP | Time (V100) | Time (2080ti) |
---|---|---|---|
YOLOv3 608 | 33.0 | 20ms | 26ms |
YOLOv3 608+ BoFs | 37.0 | 20ms | 26ms |
YOLOv3 608 (our baseline) | 38.8 | 20ms | 26ms |
YOLOv3 608+ ASFF | 40.6 | 22ms | 30ms |
YOLOv3 608+ ASFF* | 42.4 | 22ms | 30ms |
YOLOv3 800+ ASFF* | 43.9 | 34ms | 38ms |
YOLOv3 MobileNetV1 416 + BoFs | 28.6 | - | 22 ms |
YOLOv3 MobileNetV2 416 (our baseline) | 29.0 | - | 22 ms |
YOLOv3 MobileNetV2 416 +ASFF | 30.6 | - | 24 ms |
Please cite our paper in your publications if it helps your research:
@article{liu2019asff,
title = {Learning Spatial Fusion for Single-Shot Object Detection},
author = {Songtao Liu, Di Huang and Yunhong Wang},
booktitle = {arxiv preprint arXiv:1911.09516},
year = {2019}
}
./make.sh
We also use apex, numpy, opencv, tqdm, pyyaml, matplotlib, scikit-image...
We also support tensorboard if you have installed it.
python demo.py -i /path/to/your/image \
--cfg config/yolov3_baseline.cfg -d COCO \
--checkpoint /path/to/you/weights --half --asff --rfb -s 608
Note: We currently only support COCO and VOC.
To make things easy, we provide simple COCO and VOC dataset loader that inherits torch.utils.data.Dataset
making it fully compatible with the torchvision.datasets
API.
Moreover, we also implement the Mix-up strategy in BoFs and distributed random resizing in YOLov3.
Install the MS COCO dataset at /path/to/coco from official website, default is ./data/COCO, and a soft-link is recommended.
ln -s /path/to/coco ./data/COCO
It should have this basic structure
$COCO/
$COCO/annotations/
$COCO/images/
$COCO/images/test2017/
$COCO/images/train2017/
$COCO/images/val2017/
The current COCO dataset has released new train2017 and val2017 sets, and we defaultly train our model on train2017 and evaluate on val2017.
Install the VOC dataset as ./data/VOC. We also recommend a soft-link:
ln -s /path/to/VOCdevkit ./data/VOC
First download the mix-up pretrained Darknet-53 PyTorch base network weights at: https://drive.google.com/open?id=1phqyYhV1K9KZLQZH1kENTAPprLBmymfP
or from our BaiduYun Driver
For MobileNetV2, we use the pytorch official weights (change the key name to fit our code), or from our BaiduYun Driver
By default, we assume you have downloaded the file in the ASFF/weights
dir:
Since random resizing consumes much more GPU memory, we implement FP16 training with an old version of apex.
We currently ONLY test the code with distributed training on multiple GPUs (10 2080ti or 4 Tesla V100).
To train YOLOv3 baseline (ours) using the train script simply specify the parameters listed in main.py
as a flag or manually change them on config/yolov3_baseline.cfg:
python -m torch.distributed.launch --nproc_per_node=10 --master_port=${RANDOM+10000} main.py \
--cfg config/yolov3_baseline.cfg -d COCO --tfboard --distributed --ngpu 10 \
--checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --log_dir log/COCO -s 608
Note:
--cfg: config files.
--tfboard: use tensorboard.
--distributed: distributed training (we only test the code with distributed training)
-d: choose datasets, COCO or VOC.
--ngpu: number of GPUs.
-c, --checkpoint: pretrained weights or resume weights. You can pick-up training from a checkpoint by specifying the path as one of the training parameters (again, see main.py
for options)
--start_epoch: used for resume training.
--half: FP16 training.
--log_dir: log dir for tensorboard.
-s: evaluation image size, from 320 to 608 as in YOLOv3.
To train YOLOv3 with ASFF or ASFF*, you only need add some addional flags:
python -m torch.distributed.launch --nproc_per_node=10 --master_port=${RANDOM+10000} main.py \
--cfg config/yolov3_baseline.cfg -d COCO --tfboard --distributed --ngpu 10 \
--checkpoint weights/darknet53_feature_mx.pth --start_epoch 0 --half --asff --rfb --dropblock \
--log_dir log/COCO_ASFF -s 608
Note:
To evaluate a trained network, you can use the following command:
python -m torch.distributed.launch --nproc_per_node=10 --master_port=${RANDOM+10000} eval.py \
--cfg config/yolov3_baseline.cfg -d COCO --distributed --ngpu 10 \
--checkpoint /path/to/you/weights --half --asff --rfb -s 608
By default, it will directly output the mAP results on COCO val2017 or VOC test 2007.
yolov3 mobilenetv2 (ours)weights baiduYun training tfboard log
yolov3 mobilenetv2 +asff weights baiduYun training tfboard log
yolov3_baseline (ours) weights baiduYun training tfboard log
yolov3_asff weights baiduYun training tfboard log