Minimal PyTorch implementation of Yolact:《YOLACT: Real-time Instance Segmentation》.
The original project is here.
This implementation simplified the original code, preserved the main function and made the network easy to understand.
This implementation has not been updated to Yolact++.
PyTorch >= 1.1
Python >= 3.6
onnxruntime-gpu == 1.6.0 for CUDA 10.2
TensorRT == 7.2.3.4
tensorboardX
Other common packages.
# Build cython-nms
python setup.py build_ext --inplace
self.data_root
in 'res101_coco' in config.py
. Yolact trained weights.
Backbone | box mAP | mask mAP | number of parameters | Google Drive | Baidu Cloud |
---|---|---|---|---|---|
Resnet50 | 31.3 | 28.8 | 31.16 M | best_28.8_res50_coco_340000.pth | password: uu75 |
Resnet101 | 33.4 | 30.4 | 50.15 M | best_30.4_res101_coco_340000.pth | password: njsk |
swin_tiny | 34.3 | 32.1 | 34.58 M | best_31.9_swin_tiny_coco_308000.pth | password: i8e9 |
ImageNet pre-trained weights.
Backbone | Google Drive | Baidu Cloud |
---|---|---|
Resnet50 | backbone_res50.pth | password: juso |
Resnet101 | backbone_res101.pth | password: 5wsp |
swin_tiny | swin-tiny.pth | password: g0o2 |
2021.4.19. Use swin_tiny transformer as backbone, +1.0 box mAP, +1.4 mask mAP.
2021.1.7. Focal loss did not help, tried conf_alpha 4, 6, 7, 8.
2021.1.7. Less training iterations, 800k --> 680k with batch size 8.
2020.11.2. Improved data augmentation, use rectangle anchors, training is stable, infinite loss no longer appears.
2020.11.2. DDP training, train batch size increased to 16, +0.4 box mAP, +0.7 mask mAP (resnet101).
# Train with resnet101 backbone on one GPU with a batch size of 8 (default).
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --train_bs=8
# Train on multiple GPUs (i.e. two GPUs, 8 images per GPU).
export CUDA_VISIBLE_DEVICES=0,1 # Select the GPU to use.
python -m torch.distributed.launch --nproc_per_node=2 --master_port=$((RANDOM)) train.py --train_bs=16
# Train with other configurations (res101_coco, res50_coco, res50_pascal, res101_custom, res50_custom, in total).
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --cfg=res50_coco
# Train with different batch_size (batch size should not be smaller than 4).
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --train_bs=4
# Train with different image size (anchor settings related to image size will be adjusted automatically).
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --img_size=400
# Resume training with a specified model.
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --resume=weights/latest_res101_coco_35000.pth
# Set evalution interval during training, set -1 to disable it.
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --val_interval 8000
# Train on CPU.
python train.py --train_bs=4
tensorboard --logdir=tensorboard_log/res101_coco
# Select the GPU to use.
export CUDA_VISIBLE_DEVICES=0
# Evaluate on COCO val2017 (configuration will be parsed according to the model name).
# The metric API in this project can not get the exact COCO mAP, but the evaluation speed is fast.
python eval.py --weight=weights/best_30.4_res101_coco_340000.pth
# To get the exact COCO mAP:
python eval.py --weight=weights/best_30.4_res101_coco_340000.pth --coco_api
# Evaluate with a specified number of images.
python eval.py --weight=weights/best_30.4_res101_coco_340000.pth --val_num=1000
# Evaluate with traditional nms.
python eval.py --weight=weights/best_30.4_res101_coco_340000.pth --traditional_nms
# Select the GPU to use.
export CUDA_VISIBLE_DEVICES=0
# To detect images, pass the path of the image folder, detected images will be saved in `results/images`.
python detect.py --weight=weights/best_30.4_res101_coco_340000.pth --image=images
# Use --cutout to cut out detected objects.
python detect.py --weight=weights/best_30.4_res101_coco_340000.pth --image=images --cutout
# To detect videos, pass the path of video, detected video will be saved in `results/videos`:
python detect.py --weight=weights/best_30.4_res101_coco_340000.pth --video=videos/1.mp4
# Use --real_time to detect real-timely.
python detect.py --weight=weights/best_30.4_res101_coco_340000.pth --video=videos/1.mp4 --real_time
# Use --hide_mask, --hide_score, --save_lincomb, --no_crop and so on to get different results.
python detect.py --weight=weights/best_30.4_res101_coco_340000.pth --image=images --save_lincomb
python export2onnx.py --weight='weights/best_30.4_res101_coco_340000.pth' --opset=12
# Detect with ONNX file, all the options are the same as those in `detect.py`.
python detect_with_onnx.py --weight='onnx_files/res101_coco.onnx' --image=images.
python export2trt.py --weight='onnx_files/res101_coco.onnx'
# Detect with TensorRT, all the options are the same as those in `detect.py`.
python detect_with_trt.py --weight='trt_files/res101_coco.trt' --image=images.
img
folder in data/config.py
.
# Generate a coco-style json.
python utils/pascal2coco.py --folder_path=/home/feiyu/Data/pascal_sbd
# Training.
python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --cfg=res50_pascal
pip install labelme
python utils/labelme2coco.py --img_dir=custom_dataset --label_name=cuatom_dataset/labels.txt --img_type=jpg
CUSTOM_CLASSES
in config.py
.CUSTOM_CLASSES
should be like ('dog', )
. The final comma is necessary to make it as a tuple, or the number of classes would be len('dog')
. config.py
, modify the corresponding self.train_imgs
and self.train_ann
. If you need to validate, prepare the validation dataset by the same way. python -m torch.distributed.launch --nproc_per_node=1 --master_port=$((RANDOM)) train.py --cfg=res101_custom
self.lr_steps
, self.warmup_until
in your configuration.@inproceedings{yolact-iccv2019,
author = {Daniel Bolya and Chong Zhou and Fanyi Xiao and Yong Jae Lee},
title = {YOLACT: {Real-time} Instance Segmentation},
booktitle = {ICCV},
year = {2019},
}
@article{liu2021Swin,
title={Swin Transformer: Hierarchical Vision Transformer using Shifted Windows},
author={Liu, Ze and Lin, Yutong and Cao, Yue and Hu, Han and Wei, Yixuan and Zhang, Zheng and Lin, Stephen and Guo, Baining},
journal={arXiv preprint arXiv:2103.14030},
year={2021}
}