AutoMecUA / AutoMec-AD

Autonomous RC car with the help of ROS Noetic and ML.
GNU General Public License v3.0
15 stars 2 forks source link

Use Neural Network for detection of traffic signals #176

Closed manuelgitgomes closed 7 months ago

manuelgitgomes commented 1 year ago

Explore NN for signal detection. Try Yolo (various versions, maybe mini is more adequate). Two different challenges are present, the detection of regular traffic signs and the detection of luminous signs. Try the first one first, as it probably is more documented.

manuelgitgomes commented 1 year ago

https://viso.ai/deep-learning/yolov7-guide/

callmesora commented 1 year ago

@callmesora here.

My word of advice for this issue would be to use YOLOv5 instead of v7 or any other variant! The performance is very similar and there is a much bigger community /maturity to YOLOv5.

The only reason I would use v7 instead of v5 would be if my company didn't want to pay a license. As we are a non commercial project we can use v5 freely.

Recap:

To deploy this model we can use one of the following:

From my experience I would say the best thing for our scenario is to simply load the model from hub. It's like 3 lines of code! The downside of this is the performance is not that optimized compared to the other options but we don't have a very high troughoutput requirement.

In other words you don't need our detection to run really fast (10+ FPS) , 5 is enough since our car is not that fast!

Simplicity > Peak Performance when deploying models in real life!

Here is an example to help you. SOURCE

import torch

# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5s')

# Image
im = 'https://ultralytics.com/images/zidane.jpg'

# Inference
results = model(im)

results.pandas().xyxy[0]
#      xmin    ymin    xmax   ymax  confidence  class    name
# 0  749.50   43.50  1148.0  704.5    0.874023      0  person
# 1  433.50  433.50   517.5  714.5    0.687988     27     tie
# 2  114.75  195.75  1095.0  708.0    0.624512      0  person
# 3  986.00  304.00  1028.0  420.0    0.286865     27     tie
callmesora commented 1 year ago

@TatianaResend (I think you are the one who is going to work a bit on this so tag me if you want any help!)

In pratical terms you will do something like this:

  1. Google for YOLOv5 traffic sign detection (or something similar) and find a dataset .

Object detection datasets can come in many formats of labeling. The most popular formats are COCO / Darknet (Yolov4) and YOLOv5 (PyTorch Darknet) which is an adaptation of the previous. You might / or not need to convert these labels to train YOLOv5, if you can find a dataset already in the YOLOv5 format that is great!

For refferece, Darknet is a framework written in C where YOLO (object detection algorithm) was originally written on!

  1. Train the model on the given dataset and download the model
  2. Load the model from pytorchhub like the tutorial I sent on the last comment

You can load custom models as such:

model = torch.hub.load('ultralytics/yolov5', 'custom', path='path/to/best.pt')  # local model

Pro tips 👍 :

You can download free datasets from Roboflow Open Source Universe (I use this a lot ;) ) Roboflow Universe: https://universe.roboflow.com/ Me being very nice to you: https://universe.roboflow.com/school-oabm7/traffic-sign-kv5qp

Alright skip str8 to the good stuff

I'm gonna give you a step by step example on how to solve this problem in a few hours

  1. Go to google colab
  2. Git clone this repo https://github.com/maheravi/YoloV5-TrafficSign
  3. !python train.py --img 640 --batch 16 --epochs 50 --data dataset.yaml --weights yolov5s.pt
  4. Download the .pt file
  5. Put it on the AutoMec-AD repo
  6. Load it via torchub ,
  7. Run inference (inference means running the model on a given data)
callmesora commented 1 year ago

Struggled a bit to find a dataset with the signals we needed, this was the best I could find, perhaps we have to create our own dataset with virtual data of the signals and label it ourselves

Here's the closest I could find https://universe.roboflow.com/kendrickxy/european-road-signs

callmesora commented 1 year ago

I sugest doing this:

  1. Adding the competition signs to gazebo
  2. Recording video
  3. Extract frame (images) from the video
  4. Create our own dataset and upload it to roboflow
  5. Lable it and export it to Yolov5 format
manuelgitgomes commented 1 year ago

For training with AMD GPU, please use ROCm

manuelgitgomes commented 1 year ago

A new tool for recording datasets for traffic signals was developed. To run it, it is recommended to use a new world created with various signals:

roslaunch prometheus_gazebo arena.launch world:=signals

After spawning the car, the tool to record datasets is very similar to the one used for the lanes. To run it, use:

roslaunch prometheus_signal_recognition dataset_writing.launch
callmesora commented 1 year ago

A roboflow dataset has been created

To create a labeling job one should :

Press anotate , click on the images to label image

click assign images image

Choose the number of images and who will label them

image

callmesora commented 1 year ago

These are the signals we have to label, we need to define a name for each of the classes for congruence image

callmesora commented 1 year ago

Let's use this hash to label the things. We should all use this convection when labeling (roboflow will make it easier) once a class is added we can select from those but here is the map


{31: "estreitamento", 32: "passadeira", 33:"autocarro"}

{21:"vaca",22:"hospital",23"rotunda"}

{11:"descida",12:"sessenta",13:"farois"}

{01:"perigo",02:"parking",03:"esquerda}
callmesora commented 1 year ago

Example of the labeling.

You should ALLWAYS label things that are partially occluded

For example the hospital sign that ou can barely see on the back, you should label the little that you see, this will make our system robust

image

Ze6000 commented 1 year ago

Dataset label done. Number of labels:

manuelgitgomes commented 1 year ago

Future tasks:

TatianaResend commented 1 year ago

I will try to make a code that gives label automatically. It may take more time at first than just removing photos from the gazebo, but to label so many photos is worth it.

TatianaResend commented 1 year ago

To automatically label it, we thought:

TatianaResend commented 1 year ago

At this moment, I can receive the position of each signal, the camera and the intrinsics. I can also calculate the transformation between two points, but I don't know how I get the corner points of the signs. @manuelgitgomes and @callmesora Do you have any suggestions?

manuelgitgomes commented 1 year ago

Hello @TatianaResend

First of all, great job!

Regarding your problem, the transformation between the point where the pose is retrieved and the corners is constant, just a matter of taking the time to find it. We can try and debug it together. The first step is to find where this point is located. Nextly, the transformation between this point and every corner needs to be found. I believe both steps can be accomplished by looking through the .sdf

TatianaResend commented 1 year ago

There are some difficulties in transforming 3D points into 2D, although I have already checked all the steps to obtain 2D points, the coordinate values do not make sense.

Steps to get 2D points from 3D points:

Each matrix was verified individually and the final matrix, the camera parameters (focal length) and the cv2.projectPoints function. All these are correct.

manuelgitgomes commented 1 year ago

Hello @TatianaResend!

It works, finally!

Thank you very much for your patience, but this was a "me" mistake. Honestly, I didn't know the purpose behind the optical_frame usually placed in urdf files. Well, they are placed there because the frame of the camera used in gazebo visualization is different from the one used in image representation (x to the front vs z to front, etc...) I did not take this into account, so this obviously had a great impact here.

After correcting this, the script works! You can see it here!

Please check my PR (#185) to verify the changes I made in your work. If you don't need any additional help, we can cancel our meeting on Thursday. If you need, we can maintain it without a problem.

Regarding next steps, continue on the way you were! I suggest to change the image_raw_topic parameter to one named camera_name or equivalent. This would allow us to use this parameter in both the image_raw, the camera_info and the camera_optical_frame.

Thanks again!

TatianaResend commented 1 year ago

I was able to make the label almost automatic. It is not 100% automatic because a problem has arisen. The signals behind the camera also give points inside the image and I still haven't found the "perfect" condition to select the signals that are only in front of the camera. Anyway, with the current conditions it is possible to make a label much faster than manually. It is possible to label about 200 images in 20 minutes, so initially I'm will create a dataset label with 250 images of each signal.

manuelgitgomes commented 1 year ago

That is great @TatianaResend!

The signals behind the camera also give points inside the image

To tackle this, try to discard all points were z < 0!

TatianaResend commented 1 year ago

I have already implemented the calculation of the overlapping percentage of the bounding box and allowing the signals to be partially outside the image. I tried to take into account the orientation of the object relative to the car to check if the sign is facing the car, but the orientation of the car and the signal received from the gazebo oscillates a lot, causing the angle between the sign and the car to vary a lot .

Screenshot from 2023-03-06 20-41-33

TatianaResend commented 1 year ago

At this moment, it is possible to make the label automatic (almost).

The program is saving the information in 3 different folders:

Dataset label done (about +600 images):

TatianaResend commented 1 year ago

I followed the suggestion commented above using the code available in the repository https://github.com/maheravi/YoloV5-TrafficSign. I tried to train the network on my computer but as I can't train using the GPU, with the CPU it is extremely slow. So the solution is to train on Google GPUs, using Google colab.

I'm gonna give you a step by step example on how to solve this problem in a few hours

1. Go to google colab

2. Git clone this repo https://github.com/maheravi/YoloV5-TrafficSign

3. `!python train.py --img 640 --batch 16 --epochs 50 --data dataset.yaml --weights yolov5s.pt`

4. Download the .pt file

I followed the steps above and got some apparently satisfactory results. I used only 10 epochs because I was just doing a pre-test on the model.


Training results:

Screenshot from 2023-03-26 18-05-48


Detection:

I tested it on some images and when it detected it, it correctly detected the signal.

manuelgitgomes commented 1 year ago

Hello @TatianaResend! Please:

TatianaResend commented 1 year ago

To create a real dataset, it is necessary to position the camera at the 'top front right'.

PedroMS3 commented 1 year ago

recently we discovered the missing pieces to put the cameras together. We can now make new datasets.

TatianaResend commented 7 months ago

Real-time Classification:

Real-time Classification

To-do: Add comments and optimize the code for improved speed and clarity.