This is the official implementation of "SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation (ECCV2020)" built on the open-source mmdetection and maskrcnn-benchmark.
Single-stage instance segmentation approaches have recently gained popularity due to their speed and simplicity, but are still lagging behind in accuracy, compared to two-stage methods. We propose a fast single-stage instance segmentation method, called SipMask, that preserves instance-specific spatial information by separating the mask prediction of an instance to different sub-regions of a detected bounding-box. Our main contribution is a novel light-weight spatial preservation (SP) module that generates a separate set of spatial coefficients for each sub-region within a bounding-box, leading to improved mask predictions. It also enables accurate delineation of spatially adjacent instances. Further, we introduce a mask alignment weighting loss and a feature alignment scheme to better correlate mask prediction with object detection.
python -m torch.distributed.launch --nproc_per_node=4 --master_port=$((RANDOM+10000)) tools/train_net.py --config-file ${CONFIG_FILE} DATALOADER.NUM_WORKERS 2 OUTPUT_DIR ${OUTPUT_PATH}
e.g.,
python -m torch.distributed.launch --nproc_per_node=4 --master_port=$((RANDOM+10000)) tools/train_net.py --config-file configs/sipmask/sipmask_R_50_FPN_1x.yaml DATALOADER.NUM_WORKERS 2 OUTPUT_DIR training_dir/sipmask_R_50_FPN_1x
python tools/test_net.py --config-file ${CONFIG_FILE} MODEL.WEIGHT ${CHECKPOINT_FILE} TEST.IMS_PER_BATCH 4
e.g.,
python tools/test_net.py --config-file configs/sipmask/sipmask_R_50_FPN_1x.yaml MODEL.WEIGHT training_dir/SipMask_R50_1x.pth TEST.IMS_PER_BATCH 4
name | backbone | input size | epoch | ms-train | val. box AP | val. mask AP | download |
---|---|---|---|---|---|---|---|
SipMask | R50 | 800 × 1333 | 1x | no | 39.5 | 34.2 | model |
SipMask | R101 | 800 × 1333 | 3x | yes | 44.1 | 37.8 | model |
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM} [optional arguments]
e.g.,
CUDA_VISIBLE_DEVICES=0,1,2,3 ./tools/dist_train.sh configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py 4 --validate
python tools/test.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] [--eval ${EVAL_METRICS}] [--show]
e.g.,
python tools/test.py ./configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py ./work_dirs/sipmask_r50_caffe_1x.pth --out results.pkl --eval bbox segm
With our trained model, detection results of an image can be visualized using the following command.
python ./demo/sipmask_demo.py ${CONFIG_FILE} ${CHECKPOINT_FILE} ${IMAGE_FILE} [--out ${OUT_PATH}]
e.g.,
python ./demo/sipmask_demo.py ./configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py ./sipmask_r50_caffe_1x.pth ./demo/demo.jpg --out ./demo/aa.jpg
name | backbone | input size | epoch | ms-train | GN | val. box AP | val. mask AP | download |
---|---|---|---|---|---|---|---|---|
SipMask | R50 | 800×1333 | 1x | no | yes | 38.2 | 33.5 | model |
SipMask | R50 | 800×1333 | 2x | yes | yes | 40.8 | 35.6 | model |
SipMask | R101 | 800×1333 | 4x | yes | yes | 43.6 | 37.8 | model |
SipMask | R50 | 544×544 | 6x | yes | no | 36.0 | 31.7 | model |
SipMask | R50 | 544×544 | 10x | yes | yes | 37.1 | 32.4 | model |
SipMask | R101 | 544×544 | 6x | yes | no | 38.4 | 33.6 | model |
SipMask | R101 | 544×544 | 10x | yes | yes | 40.3 | 34.8 | model |
SipMask++ | R101-D | 544×544 | 6x | yes | no | 40.1 | 35.2 | model |
SipMask++ | R101-D | 544×544 | 10x | yes | yes | 41.3 | 36.1 | model |
Please note that, to run YouTube-VIS dataset like MaskTrackRCNN, install the cocoapi for youtube-vis instead of installing the original cocoapi for coco as follows.
pip install git+https://github.com/youtubevos/cocoapi.git#"egg=pycocotools&subdirectory=PythonAPI"
or
cd SipMask-VIS/pycocotools/cocoapi/PythonAPI
python setup.py build_ext install
./tools/dist_train.sh ${CONFIG_FILE} ${GPU_NUM}
e.g.,
CUDA_VISIBLE_DEVICES=0,1,2,3 ./toools/dist_train.sh ./configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py 4
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm
e.g.,
python ./tools/test_video.py configs/sipmask/sipmask_r50_caffe_fpn_gn_1x_4gpu.py ./work_dirs/sipmask_r50_fpn_1x.pth --out results.pkl --eval segm
If you want to save the results of video instance segmentation, please use the following command:
python tools/test_video.py ${CONFIG_FILE} ${CHECKPOINT_FILE} [--out ${RESULT_FILE}] --eval segm --show --save_path= ${SAVE_PATH}
name | backbone | input size | epoch | ms-train | val. mask AP | download |
---|---|---|---|---|---|---|
SipMask | R50 | 360 × 640 | 1x | no | 32.5 | model |
SipMask | R50 | 360 × 640 | 1x | yes | 33.7 | model |
If the project helps your research, please cite this paper.
@article{Cao_SipMask_ECCV_2020,
author = {Jiale Cao and Rao Muhammad Anwer and Hisham Cholakkal and Fahad Shahbaz Khan and Yanwei Pang and Ling Shao},
title = {SipMask: Spatial Information Preservation for Fast Image and Video Instance Segmentation},
journal = {Proc. European Conference on Computer Vision},
year = {2020}
}
Many thanks to the open source codes, i.e., FCOS, mmdetection, YOLACT, and MaskTrack RCNN.