Use Neural Network for detection of traffic signals

manuelgitgomes commented 1 year ago

Explore NN for signal detection. Try Yolo (various versions, maybe mini is more adequate). Two different challenges are present, the detection of regular traffic signs and the detection of luminous signs. Try the first one first, as it probably is more documented.

manuelgitgomes commented 1 year ago

https://viso.ai/deep-learning/yolov7-guide/

callmesora commented 1 year ago

@callmesora here.

My word of advice for this issue would be to use YOLOv5 instead of v7 or any other variant! The performance is very similar and there is a much bigger community /maturity to YOLOv5.

The only reason I would use v7 instead of v5 would be if my company didn't want to pay a license. As we are a non commercial project we can use v5 freely.

Recap:

Yolov4/v5/v7 have very similar performance
YOLOv5 is way easier to deploy / work with

To deploy this model we can use one of the following:

ONNX Runtime
Load from torchub
TensorRT (if we have NVIDIA GPU pc)
OpenVINO (if we have intel CPU)
TorchInference
TensorflowLite (awesome for raspberry pi / google coral)

From my experience I would say the best thing for our scenario is to simply load the model from hub. It's like 3 lines of code! The downside of this is the performance is not that optimized compared to the other options but we don't have a very high troughoutput requirement.

In other words you don't need our detection to run really fast (10+ FPS) , 5 is enough since our car is not that fast!

Simplicity > Peak Performance when deploying models in real life!

Here is an example to help you. SOURCE

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
im = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(im)

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie

callmesora commented 1 year ago

@TatianaResend (I think you are the one who is going to work a bit on this so tag me if you want any help!)

In pratical terms you will do something like this:

Google for YOLOv5 traffic sign detection (or something similar) and find a dataset .

Object detection datasets can come in many formats of labeling. The most popular formats are COCO / Darknet (Yolov4) and YOLOv5 (PyTorch Darknet) which is an adaptation of the previous. You might / or not need to convert these labels to train YOLOv5, if you can find a dataset already in the YOLOv5 format that is great!

For refferece, Darknet is a framework written in C where YOLO (object detection algorithm) was originally written on!

Train the model on the given dataset and download the model
Load the model from pytorchhub like the tutorial I sent on the last comment

You can load custom models as such:

model = torch.hub.load('ultralytics/yolov5', 'custom', path='path/to/best.pt')  # local model

Pro tips 👍 :

You can download free datasets from Roboflow Open Source Universe (I use this a lot ;) ) Roboflow Universe: https://universe.roboflow.com/ Me being very nice to you: https://universe.roboflow.com/school-oabm7/traffic-sign-kv5qp

Alright skip str8 to the good stuff

I'm gonna give you a step by step example on how to solve this problem in a few hours

Go to google colab
Git clone this repo https://github.com/maheravi/YoloV5-TrafficSign
!python train.py --img 640 --batch 16 --epochs 50 --data dataset.yaml --weights yolov5s.pt
Download the .pt file
Put it on the AutoMec-AD repo
Load it via torchub ,
Run inference (inference means running the model on a given data)

callmesora commented 1 year ago

Struggled a bit to find a dataset with the signals we needed, this was the best I could find, perhaps we have to create our own dataset with virtual data of the signals and label it ourselves

Here's the closest I could find https://universe.roboflow.com/kendrickxy/european-road-signs

callmesora commented 1 year ago

I sugest doing this:

Adding the competition signs to gazebo
Recording video
Extract frame (images) from the video
Create our own dataset and upload it to roboflow
Lable it and export it to Yolov5 format

manuelgitgomes commented 1 year ago

For training with AMD GPU, please use ROCm

manuelgitgomes commented 1 year ago

A new tool for recording datasets for traffic signals was developed. To run it, it is recommended to use a new world created with various signals:

roslaunch prometheus_gazebo arena.launch world:=signals

After spawning the car, the tool to record datasets is very similar to the one used for the lanes. To run it, use:

roslaunch prometheus_signal_recognition dataset_writing.launch

callmesora commented 1 year ago

A roboflow dataset has been created

To create a labeling job one should :

Press anotate , click on the images to label

click assign images

Choose the number of images and who will label them

callmesora commented 1 year ago

These are the signals we have to label, we need to define a name for each of the classes for congruence

callmesora commented 1 year ago

Let's use this hash to label the things. We should all use this convection when labeling (roboflow will make it easier) once a class is added we can select from those but here is the map


{31: "estreitamento", 32: "passadeira", 33:"autocarro"}

{21:"vaca",22:"hospital",23"rotunda"}

{11:"descida",12:"sessenta",13:"farois"}

{01:"perigo",02:"parking",03:"esquerda}

callmesora commented 1 year ago

Example of the labeling.

You should ALLWAYS label things that are partially occluded

For example the hospital sign that ou can barely see on the back, you should label the little that you see, this will make our system robust

Ze6000 commented 1 year ago

Dataset label done. Number of labels:

Autocarro - 48
Descida - 9
Esquerda - 33
Estreitamento - 42
Farois - 124
Hospital - 94
Parking - 48
Passadeira - 17
Perigo - 125
Rotunda - 107
Sessenta - 54
Vaca - 12 More image are need for descida, passadeira, esquerda e vaca.

manuelgitgomes commented 1 year ago

Future tasks:

[x] Create and label new datasets
[ ] Explore the usage of google collabs

TatianaResend commented 1 year ago

I will try to make a code that gives label automatically. It may take more time at first than just removing photos from the gazebo, but to label so many photos is worth it.

TatianaResend commented 1 year ago

To automatically label it, we thought:

[x] Receive the position of each signal
[x] Calculate corner positions
[x] Receive camera position (top_right)
[x] Receive the camera intrinsic
[x] Calculate the transformation between signal and camera
[x] Transform 3D points into 2D
[x] Get a frame
[x] Check if the point belongs to the image
[x] Check the overlapping of the signals
[x] Export the labels to YOLO format

TatianaResend commented 1 year ago

At this moment, I can receive the position of each signal, the camera and the intrinsics. I can also calculate the transformation between two points, but I don't know how I get the corner points of the signs. @manuelgitgomes and @callmesora Do you have any suggestions?

manuelgitgomes commented 1 year ago

Hello @TatianaResend

First of all, great job!

Regarding your problem, the transformation between the point where the pose is retrieved and the corners is constant, just a matter of taking the time to find it. We can try and debug it together. The first step is to find where this point is located. Nextly, the transformation between this point and every corner needs to be found. I believe both steps can be accomplished by looking through the .sdf

TatianaResend commented 1 year ago

There are some difficulties in transforming 3D points into 2D, although I have already checked all the steps to obtain 2D points, the coordinate values do not make sense.

Steps to get 2D points from 3D points:

get matrix of point footprint for camera: matrix_footprint2cam
get matrix of point world for footprint: matrix_world2footprint
get matrix of point world for camera: matrix_world2cam
get matrix of point camera for world: matrix_cam2world
get the coordinates of the signs
use cv2.projectPoints function

Each matrix was verified individually and the final matrix, the camera parameters (focal length) and the cv2.projectPoints function. All these are correct.

manuelgitgomes commented 1 year ago

Hello @TatianaResend!

It works, finally!

Thank you very much for your patience, but this was a "me" mistake. Honestly, I didn't know the purpose behind the optical_frame usually placed in urdf files. Well, they are placed there because the frame of the camera used in gazebo visualization is different from the one used in image representation (x to the front vs z to front, etc...) I did not take this into account, so this obviously had a great impact here.

After correcting this, the script works! You can see it here!

Please check my PR (#185) to verify the changes I made in your work. If you don't need any additional help, we can cancel our meeting on Thursday. If you need, we can maintain it without a problem.

Regarding next steps, continue on the way you were! I suggest to change the image_raw_topic parameter to one named camera_name or equivalent. This would allow us to use this parameter in both the image_raw, the camera_info and the camera_optical_frame.

Thanks again!

TatianaResend commented 1 year ago

I was able to make the label almost automatic. It is not 100% automatic because a problem has arisen. The signals behind the camera also give points inside the image and I still haven't found the "perfect" condition to select the signals that are only in front of the camera. Anyway, with the current conditions it is possible to make a label much faster than manually. It is possible to label about 200 images in 20 minutes, so initially I'm will create a dataset label with 250 images of each signal.

manuelgitgomes commented 1 year ago

That is great @TatianaResend!

The signals behind the camera also give points inside the image

To tackle this, try to discard all points were z < 0!

TatianaResend commented 1 year ago

I have already implemented the calculation of the overlapping percentage of the bounding box and allowing the signals to be partially outside the image. I tried to take into account the orientation of the object relative to the car to check if the sign is facing the car, but the orientation of the car and the signal received from the gazebo oscillates a lot, causing the angle between the sign and the car to vary a lot .

Screenshot from 2023-03-06 20-41-33

TatianaResend commented 1 year ago

At this moment, it is possible to make the label automatic (almost).

The program is saving the information in 3 different folders:

"images" folder: saves each image
- "labels" folder: saves each document in .txt format with the label of the signal in the image (corresponding number of the signal, center_x, center_y, height and width)
- "images_p" folder: saves each image, this image is the same as the one saved in the "images" folder with one difference, in the center of each detected signal draw a blue circle. These images are intended to confirm whether the signal detection is correct.

Dataset label done (about +600 images):

[x] Cattle_signal
[x] Depression_signal
[x] Other_Dangers_signal
[x] Road_Narrow_signal
[x] Crosswalk_signal
[x] Hospital_signal
[x] Park_signal
[x] RMV_signal
[x] Lights_signal
[x] Round_About_signal
[x] Turn_Left_signal
[x] Bus_signal

TatianaResend commented 1 year ago

I followed the suggestion commented above using the code available in the repository https://github.com/maheravi/YoloV5-TrafficSign. I tried to train the network on my computer but as I can't train using the GPU, with the CPU it is extremely slow. So the solution is to train on Google GPUs, using Google colab.

I'm gonna give you a step by step example on how to solve this problem in a few hours

1. Go to google colab

2. Git clone this repo https://github.com/maheravi/YoloV5-TrafficSign

3. `!python train.py --img 640 --batch 16 --epochs 50 --data dataset.yaml --weights yolov5s.pt`

4. Download the .pt file

I followed the steps above and got some apparently satisfactory results. I used only 10 epochs because I was just doing a pre-test on the model.