Image segmentation is the process of classifying each pixel in an image to a particular class, At a top level image segmentation identify a region in an image which belongs to a particular object type. Unlike object detection task where the model can only predict the bounding box region in which the objects are present, the segmentation model can precisly extract the object/region boundaries based on the object/region shape.
This project aims to locate and segment the road region from a picture normally taken from the front camera of a vehicle using image segmentation. Road segmentation is a critical step in Advanced Driver Assistance System (ADAS) for a variety of tasks, such as extracting the driveable area, path planning, lane change detection etc. In this section we will focus only on separating the main road region from an image.
You need to have the following libraries installed on your computer before starting the project.
If you want to create and run the onnx model, you need the following packages as well.
The data required for this task is a collection of images taken from the front camera of a car / vehicle. I used my mobile phone camera to continuously record video while I was driving. The camera is mounted in the center of the front windshield. I collected 28 hours of such videos during my travels in various districts of Kerala and Karnataka. Those videos includes different types of roads scenarios (highway,road with lane marking, road without lane mark, mud road, one way, multi lane road), junctions, curves and elevations.
The next difficult step is to annotate the frames from these videos for image segmentation model. Data annotation is the process of categorizing and labeling the raw data in such a way that it can be fed to an ai model for training. In segmentation task, each pixel of an image need to be labeled. As you know this is a bit time consuming task compared to bounding box drawing in case of object detection task.
Instead of manually labeling each pixel in an image, this is achived by categorizing a region (a polygon) in the image with a label. So we need to draw the polygon through the boundaries of a region in the image. In this project the polygon will be drawn through the road region boundaries. And the pixels inside the polygon will be labelled as 'road region' and the pixels outside of this polygon will be labeled as 'non-road region'.
Basically the annotation for image segmentation is creating a mask image (with the same size of input image) corresponds to each input image. And the mask pixel values are the class label value for the corresponding pixel in the input image.
I used Intel's CVAT tool for this task.The data annotation was performed over 600 images/frames (these frames are randomly taken from different videos) for 7 classes.
The interpreted data in the CVAT tool can be downloaded in different formats as per your requirement. In this project I chose the 'Kitti' format because it contains the direct mask image required for U-Net model training.
The 'kitti data set' folder contains the following files and folder structure.
A subset of my 7 class data set is used in this project, and this subset only include 100 annotated images with two class (road , non-road).
The class label for each pixel is as follows
Non road area : The pixel value of this class in the mask frame will be 0
Road area : The pixel value of this class in the mask frame will be 2
A sample image from the data set looks like below
The annotated RGB mask of the above image is as follows.
Data augmentation is a common practice in machine learning if the data set is too small or the diversity of data in the data set is low. Basically data augmentation is nothing but introducing some changes in the input data without losing key features. With this technique you can create a large set of data from the limited original input data. In this project we only have 100 raw images, and which is very less for a training task. So we are going to expand 100 image to 7000 image with data augmentation. In this project the DA(Data Augmentation.) module use the following operations,
Random brightness : The brightness level of the input image will be changed randomly within a limit.
Random saturation : The saturation level of the input image will be changed randomly within a limit.
Random hue : The hue level of the input image will be changed randomly within a limit.
Random contrast : The contrast level of the input image will be changed randomly within a limit.
Horizontal flip : The input image will be flipped along y axis.
Note: The mask will not be modified in 1-4 operation, instead the original mask will be replicated. But in case of 5th operation both input image and mask need to be flipped to retain the strutural relationship between image and mask.
Clone this repo to your working directory with the following git command.
git clone https://github.com/asujaykk/Road-segmentation-UNET-model.git
Extarct the data set 'road_seg_kitti.zip' available in the 'data/data_set' folder to 'data_set/data_temp_folder' with the following command.
unzip data/data_set/road_seg_kitti.zip –d data/data_set/data_temp_folder
Expand the data set 70 times by executing the following data augmentation script (The script will take few minute to process).
python3 DA_module.py --input data/data_set/data_temp_folder/road_seg_kitti --factor 70
The command line parameters expected by 'DA_module.py' is explained below.
--factor : This parameter should be a number (default is 70), and the script will replicate the input image and mask by this factor.
Note: Please do not run the 'DA_module.py' on the data set twice, because the script will take the current image count in 'default/image_2' and then replicate it 70 times. So if you run 'DA_module.py' twice then the image will be expanded to '70*70' times.
U-Net architecture is chosen for this task since the training cost for this model is less, and this model gave good benchmark over different data sets for different tasks. For more information with respect to U-Net architecture please check this link https://paperswithcode.com/method/u-net.
Initially the plan was to use pytorch library to realize the model. But since i already used pytorch library for image classification and speech recognition task, this time i thought to use keras and tensorflow so that i can explore these libraries in detail. The model realization is available in 'model.py' file.
Once you have the model architecture and data ready, then the next step is to train your model with the prepared data set and generate an optimized model. This trained model will be used for your future prediction. Model training require rich hardwares for faster training, especially GPU's(graphical Processing Unit) with cuda cores. If your PC configuration is not great then you have to wait hours to get the result. As a developer the best option would be to use online platform like kaggle notebook , google colab or AWS-EC2, because they provide best GPU backend support (Go for paid version if you need un-interrupted training for a long time with more RAM and GPU memory) and rich memory for processing complex models on huge data set.
I chose google colab to train the U-Net model for (160 x 160) input size, and an augmented dataset of 7000 images. I chose 32 batch size and 15 epochs for faster training. The trained model was saved in my google drive for inferenecing.
You can use the script "model_train.py" for training the U-Net model over the data set that we created before.
python3 model_train.py --dataset data_set/data_temp_folder/road_seg_kitti --output models/pretrained_models --batch 4 --epoch 15
The command line parameters expected by 'model_train.py' is explained below.
If you are facing 'OOM'(out of memory) error from tensorflow while training , then please reduce the batch size to a small number. You can increase the batch size upto 32 based on available computer resources.
Once the training completed successfully,then you can find your trained model "road_segmentation_160_160_test.h5" under "models/pretrained_models" directory ;)
Model inferencing is the process of using the model for prediction from new images. The model can be used to predict the road region from new images/Videos, and we have an Inference pipeline created to predict the road region from an input mp4 Video/IP cam videos.
We have a few pretrained models under 'models/pretrained_models' folder for testing. And they can be tested with the following inference script.
python3 inference.py --src <path_to_mp4_video> --model models/pretrained_models/road_segmentation_160_160.h5
The command line parameters expected by 'inference.py' is explained below.
If you want to test the model that you created before, then please change the '--model' parameter to 'models/pretrained_models/road_segmentation_160_160_test.h5'. and run the 'inference.py'.
The keras model inference took 265 to 340 ms to process one input image, which is pretty slow for real world application. Thus for better performance and deployement we decided to convert the kers model in to onnx model(Open Neural Network Exchange). If you want to know why we are converting keras model to onnx, then please check this link :https://pythonsimplified.com/onnx-for-model-interoperability-faster-inference/.
There is already a script available in this repo for converting keras model to onnx model. You can execute the below command to convert keras model to onnx model.
python3 generate_onnx.py --input models/pretrained_models/road_segmentation_160_160.h5 --output models/onnx_models/road_seg_160_160.onnx --temp models/saved_model/road_seg
The command line parameters expected by 'inference.py' is explained below.
If you want to convert the keras model that you created before to onnx model then please run below command.
python3 generate_onnx.py --input models/pretrained_models/road_segmentation_160_160_test.h5 --output models/onnx_models/road_seg_160_160_test.onnx --temp models/saved_model/road_seg_test
The generated onnx model provided better performance on low end machines. The onnx model inference took 75 to 90 ms to process one frame on a low end machine , which is acceptable for real world application.
To inference the onnx model please use the 'inference_onnx.py' like below.
python3 inference_onnx.py --src <path_to_mp4 video> --model models/onnx_models/road_seg_160_160.onnx
The command line parameters expected by 'inference_onnx.py' is explained below.
--model : Path to onnx model to be used for inferencing (default : models/onnx_models/road_seg_160_160.onnx).
If you want to test the model that you created before, then please change the '--model' parameter to 'models/onnx_models/road_seg_160_160_test.onnx' and run the 'inference_onnx.py'.
The inference video output is given below. In which,