This repository contains an improved implementation of PIDNet from the mmsegmentation framework by Open-MMLab. PIDNet is a highly efficient and accurate network for real-time semantic segmentation tasks, particularly tailored for autonomous vehicle applications. The repository includes configurations, training scripts, and significantly improved inference tools.
To install the required dependencies, follow these steps:
Clone the repository:
git clone https://github.com/anhphan2705/mmseg_pidnet.git
cd mmseg_pidnet
Create a virtual environment and activate it:
python -m venv venv
source venv/bin/activate # On Windows use `venv\Scripts\activate`
Install the dependencies:
pip install -r requirements.txt
If you have any problem regarding the mmcv version mismatch with PyTorch, please refer to MMCV Installation Guide
Prepare your dataset as per the mmsegmentation requirements.
mmseg_pidnet
├── mmseg
├── tools
├── configs
├── samples
├── real_time_inference.py
├── model-index.yml
├── README.md
├── requirements.txt
├── data
│ ├── cityscapes
│ │ ├── leftImg8bit
│ │ │ ├── train
│ │ │ ├── val
│ │ ├── gtFine
│ │ │ ├── train
│ │ │ ├── val
│ ├── coco_stuff10k
│ │ ├── images
│ │ │ ├── train2014
│ │ │ ├── test2014
│ │ ├── annotations
│ │ │ ├── train2014
│ │ │ ├── test2014
│ │ ├── imagesLists
│ │ │ ├── train.txt
│ │ │ ├── test.txt
│ │ │ ├── all.txt
│ ├── coco_stuff164k
│ │ ├── images
│ │ │ ├── train2017
│ │ │ ├── val2017
│ │ ├── annotations
│ │ │ ├── train2017
│ │ │ ├── val2017
| ├── dark_zurich
| │ ├── gps
| │ │ ├── val
| │ │ └── val_ref
| │ ├── gt
| │ │ └── val
| │ ├── LICENSE.txt
| │ ├── lists_file_names
| │ │ ├── val_filenames.txt
| │ │ └── val_ref_filenames.txt
| │ ├── README.md
| │ └── rgb_anon
| │ │ ├── val
| │ │ └── val_ref
| ├── NighttimeDrivingTest
| │ ├── gtCoarse_daytime_trainvaltest
| │ │ └── test
| │ │ └── night
| │ └── leftImg8bit
| │ └── test
| │ └── night
│ ├── bdd100k
│ │ ├── images
│ │ │ └── 10k
│ │ │ ├── test
│ │ │ ├── train
│ │ └── val
│ │ └── labels
│ │ └── sem_seg
│ │ ├── colormaps
│ │ │ ├──train
│ │ │ └──val
│ │ ├── masks
│ │ │ ├──train
│ │ │ └──val
│ │ ├── polygons
│ │ │ ├──sem_seg_train.json
│ │ │ └──sem_seg_val.json
│ │ └── rles
│ │ ├──sem_seg_train.json
│ │ └──sem_seg_val.json
│ ├── nyu
│ ├── images
│ │ ├── train
│ │ ├── test
│ ├── annotations
│ ├── train
│ ├── test
To train the PIDNet model, use the training script with the desired configuration file:
python tools/train.py configs/pidnet/choose_a_config.py
Make sure to adjust the configuration file to match your dataset and training preferences.
If you encounter this error message
RuntimeError: CUDA error: device-side assert triggered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
And this is the fix: https://github.com/open-mmlab/mmsegmentation/issues/3724#issuecomment-2202124709
If you see any other error, don't hesitate to open an issue request. More support on https://github.com/open-mmlab/mmsegmentation/issues
To perform real-time inference using the real_time_inference.py
script for videos, images, or live camera feed, follow these steps:
Ensure that you have the necessary model configuration and checkpoint files.
Run the real_time_inference.py
script with the appropriate arguments:
To perform real-time segmentation on a video file, use the following command:
python real_time_inference.py --video path/to/video.mp4 --config path/to/config.py --checkpoint path/to/checkpoint.pth --device cuda:0 --show
To perform segmentation on a directory of images, use the following command:
python real_time_inference.py --image path/to/image/directory/* --config path/to/config.py --checkpoint path/to/checkpoint.pth --device cuda:0 --show
To perform real-time segmentation using a live camera feed (e.g., webcam), use the following command:
python real_time_inference.py --camera 0 --config path/to/config.py --checkpoint path/to/checkpoint.pth --device cuda:0 --show
--video
: Path to the video file for inference.--images
: Path to the directory containing images for inference.--camera
: Camera source index (e.g., 0 for the default webcam).--config
: Path to the model configuration file.--checkpoint
: Path to the model checkpoint file.--device
: Device to be used for inference (cpu
or cuda:0
).--out
: Path to the output directory for images or video file path for saving results.--show
: If specified, display the video or images during processing.--wait-time
: Interval of show in seconds, default is 0.001 seconds.This script will load the trained model, perform segmentation on the input video, images, or live camera feed, and display or save the results based on the provided arguments.