XuJiacong / PIDNet

This is the official repository for our recent work: PIDNet
MIT License
596 stars 109 forks source link
camvid-dataset cityscapes-dataset real-time-semantic-segmentation semantic-segmentation

PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller

PWC PWC

PWC License: MIT

This is the official repository for our recent work: PIDNet (PDF

Highlights

overview-of-our-method
Comparison of inference speed and accuracy for real-time models on test set of Cityscapes.

Updates

Demos

A demo of the segmentation performance of our proposed PIDNets: Original video (left) and predictions of PIDNet-S (middle) and PIDNet-L (right)

Cityscapes
Cityscapes Stuttgart demo video #1

Cityscapes
Cityscapes Stuttgart demo video #2

Overview

overview-of-our-method
An overview of the basic architecture of our proposed Proportional-Integral-Derivative Network (PIDNet).

P, I and D branches are responsiable for detail preservation, context embedding and boundary detection, respectively.

Detailed Implementation

overview-of-our-method
Instantiation of the PIDNet for semantic segmentation.

For operation, "OP, N, C" means operation OP with stride of N and the No. output channel is C; Output: output size given input size of 1024; mxRB: m residual basic blocks; 2xRBB: 2 residual bottleneck blocks; OP1\OP2: OP1 is used for PIDNet-L while OP1 is applied in PIDNet-M and PIDNet-S. (m,n,C) are scheduled to be (2,3,32), (2,3,64) and (3,4,64) for PIDNet-S, PIDNet-M and PIDNet-L, respectively.

Models

For simple reproduction, we provide the ImageNet pretrained models here.

Model (ImageNet) PIDNet-S PIDNet-M PIDNet-L
Link download download download

Also, the finetuned models on Cityscapes and Camvid are available for direct application in road scene parsing.

Model (Cityscapes) Val (% mIOU) Test (% mIOU) FPS
PIDNet-S 78.8 78.6 93.2
PIDNet-M 79.9 79.8 42.2
PIDNet-L 80.9 80.6 31.1
Model (CamVid) Val (% mIOU) Test (% mIOU) FPS
PIDNet-S - 80.1 153.7
PIDNet-M - 82.0 85.6

Prerequisites

This implementation is based on HRNet-Semantic-Segmentation. Please refer to their repository for installation and dataset preparation. The inference speed is tested on single RTX 3090 using the method introduced by SwiftNet. No third-party acceleration lib is used, so you can try TensorRT or other approaches for faster speed.

Usage

0. Prepare the dataset

:smiley_cat: Instruction for preparation of CamVid data (remains discussion) :smiley_cat:

2. Evaluation

3. Speed Measurement

4. Custom Inputs

Citation

If you think this implementation is useful for your work, please cite our paper:

@misc{xu2022pidnet,
      title={PIDNet: A Real-time Semantic Segmentation Network Inspired from PID Controller}, 
      author={Jiacong Xu and Zixiang Xiong and Shankar P. Bhattacharyya},
      year={2022},
      eprint={2206.02066},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement