This is the official PyTorch implementation of the following publication:
Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots
Youqi Liao, Shuhao Kang, Jianping Li, Yang Liu, Yun Liu, Zhen Dong, Bisheng Yang,Xieyuanli Chen,
IEEE RA-L 2024
Paper | Arxiv | Project-page | Video
TL;DR: Mobile-Seed is an online framework for simultaneous semantic segmentation and boundary detection on compact robots.
Abstract: Precise and rapid delineation of sharp boundaries and robust semantics is essential for numerous downstream robotic tasks, such as robot grasping and manipulation, realtime semantic mapping, and online sensor calibration performed on edge computing units. Although boundary detection and semantic segmentation are complementary tasks, most studies focus on lightweight models for semantic segmentation but overlook the critical role of boundary detection. In this work, we introduce Mobile-Seed, a lightweight, dual-task framework tailored for simultaneous semantic segmentation and boundary detection. Our framework features a two-stream encoder, an active fusion decoder (AFD) and a dual-task regularization approach. The encoder is divided into two pathways: one captures category-aware semantic information, while the other discerns boundaries from multi-scale features. The AFD module dynamically adapts the fusion of semantic and boundary information by learning channel-wise relationships, allowing for precise weight assignment of each channel. Furthermore, we introduce a regularization loss to mitigate the conflicts in dual-task learning and deep diversity supervision. Compared to existing methods, the proposed Mobile-Seed offers a lightweight framework to simultaneously improve semantic segmentation performance and accurately locate object boundaries. Experiments on the Cityscapes dataset have shown that Mobile-Seed achieves notable improvement over the state-of-the-art (SOTA) baseline by 2.2 percentage points (pp) in mIoU and 4.2 pp in mF-score, while maintaining an online inference speed of 23.9 frames-per-second (FPS) with 1024Γ2048 resolution input on an RTX 2080 Ti GPU. Additional experiments on CamVid and PASCAL Context datasets confirm our methodβs generalizability.
Our Mobile-Seed is built on MMsegmentation 0.29.1. Please refer to the installation page for more details. We provide a Docker image on onedrive and baidudisk(code: djry) for quick start.
If you want to building from source, here is a quick installation example :
conda create --name mobileseed python=3.7 -y
conda activate mobileseed
pip install -r requirements.txt
mim install mmengine
mim install mmcv-full
git clone https://github.com/WHU-USI3DV/Mobile-Seed.git
cd Mobile-Seed
pip install -v -e .
NOTE: data preprocssing is not necessary for evaluation.
We provide pre-trained models for Cityscapes, CamVid and PASCAL Context datasets. Please download the weights of Mobile-Seed from onedrive or Baidu-disk(code:MS24) and put them in a folder like ckpt/
. We also provide our pre-trained weights of baseline method AFFormer on onedrive and baidudisk(code: zesm) for fair comparison.
Example: evaluate Mobile-Seed
on Cityscapes
:
# Single-gpu testing
bash tools/dist_test.sh ./configs/Mobile_Seed/MS_tiny_cityscapes.py /path/to/checkpoint_file.pth 1 --eval mIoU
Dataset | mIoU | mBIoU (3px) | FLOPs |
---|---|---|---|
Cityscapes | 78.4 | 43.3 | 31.6G |
CamVid | 73.4 | 45.2 | 4.1G |
PASCAL Context (60) | 47.2 | 22.1 | 3.7G |
PASCAL Context (59) | 43.0 | 16.2 | 3.7G |
Download weights of AFFormer pretrained on ImageNet-1K from google-drive or alidrive and put them in a folder like ckpt/
. On the Cityscapes dataset, we trained the Mobile-Seed with an Intel Core i9-13900K CPU and a NVIDIA RTX 4090 GPU for 160K iterations and cost approximately 22 hours.
Example: train Mobile-Seed
on Cityscapes
:
# Single-gpu training
bash tools/dist_train.sh ./configs/Mobile_Seed/MS_tiny_cityscapes.py
# Multi-gpu training
bash tools/dist_train.sh ./configs/Mobile_Seed/MS_tiny_cityscapes.py <GPU_NUM>
We provide processed Cityscapes data on onedrive and baidudisk(code: 5n7t). If you want to process the data from scratch, please refer to following steps:
unzip data_orig/gtFine_trainvaltest.zip -d data_orig && rm data_orig/gtFine_trainvaltest.zip
unzip data_orig/leftImg8bit_trainvaltest.zip -d data_orig && rm data_orig/leftImg8bit_trainvaltest.zip
unzip data_orig/leftImg8bit_demoVideo.zip -d data_orig && rm data_orig/leftImg8bit_demoVideo.zip
python data_preprocess/cityscapes_preprocess/code/createTrainIdLabelImgs.py <data_path>
# In Matlab Command Window
run code/demoPreproc_gen_png_label.m
This will create instance-insensitive semantic boundary labels for network training in data_proc_nis/
. For the difference between instance-insensitive and instance-sensitive, please refer to the SEAL.
python data_preprocess/camvid_pascal_preprocess/label_generator.py <dataset> <data_path>
We split the training and test set of Camvid according to PIDNet to avoid same-area evaluation.Here is a demo script to test a single image. More details refer to MMSegmentation's Doc.
python demo/image_demo.py ${IMAGE_FILE} ${CONFIG_FILE} ${CHECKPOINT_FILE} ${SEG_FILE} \
[--out_sebound ${SEBOUND_FILE}] [--out_bibound ${BIBOUND_FILE}] [--device ${DEVICE_NAME}] [--palette-thr ${PALETTE}]
Example: visualize the Mobile-Seed
on Cityscapes
:
python demo/image_demo.py demo/demo.png configs/Mobile_Seed/MS_tiny_cityscapes.py \
/path/to/checkpoint_file /path/to/outseg.png --device cuda:0 --palette cityscapes
If you find this repo helpful, please give us a star~.Please consider citing Mobile-Seed if this program benefits your project.
@article{liao2024mobileseed,
title={Mobile-Seed: Joint Semantic Segmentation and Boundary Detection for Mobile Robots},
author={Youqi Liao and Shuhao Kang and Jianping Li and Yang Liu and Yun Liu and Zhen Dong and Bisheng Yang and Xieyuanli Chen},
journal={IEEE Robotics and Automation Letters},
year={2024},
doi={10.1109/LRA.2024.3373235}
}
We sincerely thank the excellent projects: