Changes to CNN Arquitecture

callmesora commented 1 year ago

I've studied some different approaches that might yield better results.

Instead of using a simple sequential CNN we can opt to use feature extractors from ImageNet, such as Inception modules, and MobileNet modules and use them in the pipeline of the classifier.

Since our images are quite simplistic in their nature, I'm not sure this will help but its definitely worth trying since using these architectures can save us some training time.

This allows for faster iteration of different models during the competition.

In this issue I will implement:

Inception module feature extractor
MobineNetv2 feature extractor
Wide and Deep similar arquitecture
VGG16 replica
Apply residual connections (like ResNet ) to our architecture.

callmesora commented 1 year ago

✅ Implemented

Inception module feature extractor
MobineNetv2 feature extractor
VGG16 replica
Apply residual connections (like ResNet ) to our architecture.

I currently don't have time to test these and train them all can anyone help me on this issue? I've left the boilerplate and the strutural implementation of each of these networks in the notebook

The MobileNetV2 is the one I expect to work better (maybe better than what we currently have)

I won't have time to code a lot in the next weeks / maybe month but I can help and guide anyone who's willing to finish this issue and wants to learn a bit about vision using Deep Learning.

🚀

callmesora commented 1 year ago

@brnaguiar or any other volunteers to help out ? @tbsauce

manuelgitgomes commented 1 year ago

Find error metric (possibly rmse) and evaluate various architectures to find the most accurate. For testing, use a gazebo model with gazebo test data, use a gazebo model with physical test data, a mixed model with physical test data and a physical model with physical test data.

andrefdre commented 1 year ago

I started looking at the code but honestly didn't understand how TensorFlow works, if it's okay I will try to convert the CNNs to PyTorch. I already was successful in converting the first two networks that you said were working to PyTorch, and they seem to be working fine. I will invest now sometime in the other networks that was the purpose of my research. If you still prefer TensorFlow, I can try to test them with TensorFlow.

andrefdre commented 1 year ago

May I suggest changing the place where the dataset and model is stored for training? For example, I always have the problem of low space on Ubuntu, so maybe we could use an environment variable to specify where to store the datasets and all files, related to datasets and models, would be inside that folder. So in my case, I could store them inside an external ssd. What do you guys think?

callmesora commented 1 year ago

Yeah that is a good idea, we didn't allways train models in a notebook. it used to be a train.py script with input arguments. Perhaps its a good idea to revert back to that way. And keep the notebook just for developing purposes and test

andrefdre commented 1 year ago

I think the notebook is interesting to visualize information about the dataset with all the distribution graphs. Which one do you think it's a good idea, having an environment variable to the dataset's path?

manuelgitgomes commented 1 year ago

I think it is a good idea! Maybe $AUTOMEC_DATASETS?

andrefdre commented 1 year ago

@callmesora In the VGG Model you have two models if I'm not wrong, can you explain to me which one to use. The same happens for MiniRestNet.

andrefdre commented 1 year ago

I have finished creating the code to generate results, what do you think are this good metrics do you think I should add something else. For now, I am only training during 30 epochs, should I train more? Here are the results for the first two models, first there is a graph of the loss secondly there is a graph with the errors in the simulated dataset and afterward there is a graph with errors in the real dataset. Blank 6 Grids Collage

I noticed that the dataset generated in gazebo the camera appears in the image at the bottom. Should it be like that?

Now I will focus on using docker suggested by @manuelgitgomes the only thing I will need to change is stop using ROS to train the model which currently I am using, unless I also install ROS in the container which I don't think it is necessary.

manuelgitgomes commented 1 year ago

Hello, @andrefdre!

Firstly, congratulations! Great job in both translating the models from tensorflow to pytorch and creating an insightful loss graph.

I have used your script and successfully trained in my GPU! The training time for each epoch rounded 10 seconds, while the testing time rounded 5 seconds. I have used 4 workers, which reduced training time to a third of what it was. I have also trained with "only" 50 epochs.

I have obtained the following results for a gazebo and a real datasets, respectively: losses

losses

Now, answering your questions:

For now, I am only training during 30 epochs, should I train more?

I cannot answer you this. I think the key is to try it on gazebo and verifying if it is enough.

I noticed that the dataset generated in gazebo the camera appears in the image at the bottom. Should it be like that?

Probably not, you can gather a new dataset if you want, with:

roslaunch prometheus_driving dataset_writing.launch

Now I will focus on using docker suggested by @manuelgitgomes the only thing I will need to change is stop using ROS to train the model which currently I am using, unless I also install ROS in the container which I don't think it is necessary.

Honestly, I am rethinking this suggestion. It worked pretty seamlessly with me, so it might not be as needed as I envisioned. And I think our time would be better spent in comparing architectures or creating an LSTM than moving this to a docker image. What do you think?

Now, some questions for you:

How can I visualize data as seen in your previous comment?
I have create a PR (#184) with some minor changes to the code. Can you review it? You can read the PR description for an insight on the changes.

Congratulations again on the wonderful work!

andrefdre commented 1 year ago

The training time for each epoch rounded 10 seconds, while the testing time rounded 5 seconds. I have used 4 workers, which reduced training time to a third of what it was.

I had heard about workers somewhere but never used them can you explain to me how they work I am kind of curious because you obtained 10 seconds for each epoch and I had at best 7 or more minutes, what was the size of your dataset?.

I have obtained the following results for a gazebo and a real datasets, respectively:

Which models did you use during training?

Probably not, you can gather a new dataset if you want, with:

I figured out what it was when I looked to gazebo, basically this branch is a bit old so the camera I saw was the one in the front of the car, which in this branch was on top.

Honestly, I am rethinking this suggestion. It worked pretty seamlessly with me, so it might not be as needed as I envisioned. And I think our time would be better spent in comparing architectures or creating an LSTM than moving this to a docker image. What do you think?

How hard is to implement docker? I think docker would be nice if for example we want to train during competition in your home computer or somewhere else remotely, and we want to turn off our computer. Because if you do it without docker, it would terminate the training when you close the connection. But I also don't know if it is hard to do or not, so I don't have really an opinion in which way to go forward.

How can I visualize data as seen in your previous comment?

I created the script generate_results.py that you pass a dataset that you want to see the errors, and it will create the errors graph. You can try running the next line:

rosrun prometheus_driving generate_results.py -d validation -r mobilenet -fn mobilenet_model -mn mobilenet_model batch_size 15 -m 'MobileNetV2()' -c 0

You can add -v to visualize the images with the prediction and labeled value.

I have create a PR (https://github.com/AutoMecUA/AutoMec-AD/pull/184) with some minor changes to the code. Can you review it? You can read the PR description for an insight on the changes.

Will check it now, thanks for the review, what you added helped a lot. I was always deleting the folder by hand and still curious about the workers.

andrefdre commented 1 year ago

After @manuelgitgomes trained all the models, these are the results, @manuelgitgomes could you say what were the parameters for training? Nvidia_Rota MobileNet_Inception VGG_Resnet After reviewing the results, I think the best model is ResNet for a dataset made in Gazebo but for the real dataset it's similar to VGG, now I will record some videos with the models in Gazebo.

andrefdre commented 1 year ago

@manuelgitgomes did you change the camera angle of the car in my branch, cause this is what the car is seeing. Screenshot from 2023-02-19 20-49-35

manuelgitgomes commented 1 year ago

Hello @andrefdre!

I used 100 epochs for each model (some were cut short, as you can seen by the graphs). I used a batch size of 256 in Nvidia, ROTA and MobileNet. The rest I used a batch size of 64. I used a 4 workers for each,

@manuelgitgomes did you change the camera angle of the car in my branch, cause this is what the car is seeing.

Possibly not. Do a PR from dev to this branch, then it will be updated.

andrefdre commented 1 year ago

Possibly not. Do a PR from dev to this branch, then it will be updated.

Still is the same, and I'm kind lost, do you have some time, @manuelgitgomes, after the meeting today to help me, maybe you know where to look at?

manuelgitgomes commented 1 year ago

@andrefdre Sure, we can look at it.

andrefdre commented 1 year ago

I still have the same problem of the camera being in a weird angle. I already tried updating all the packages related to the camera, do you have any suggestion I can try @manuelgitgomes ? I also noticed that the bottom camera suffers from the same problem. After some inspection, it's as if the image is in the y-axis instead of the x-axis.

andrefdre commented 1 year ago

I managed to find what it was. So in the file prometheus.gazebo.macro in the pose line it has some rotations. Maybe previously we needed to do it and maybe since I updated they changed it. That's my guess. Previously:

    <gazebo reference="top_front_camera_optical_frame">
        <sensor name="kinect" type="depth">
            <!-- openni plugin has the x pointing towards the scene, so rotate to have z -->
            <pose frame="world">0.0 0.0 0.0 -1.5708 0.0 1.5708</pose>

Now:

<gazebo reference="top_front_camera_optical_frame">
        <sensor name="kinect" type="depth">
            <!-- openni plugin has the x pointing towards the scene, so rotate to have z -->
            <pose frame="world">0.0 0.0 0.0 0 0.0 0</pose>

andrefdre commented 1 year ago

I have recorded a video for each network. I tried to go 2 laps with the networks, but most of them failed. The two ones more promising were the simpler ones. Nvidea: https://uapt33090-my.sharepoint.com/:v:/g/personal/andref_ua_pt/EerUDj8kUlBHmpUmjhdHNmUBC98zNW6wh8i7lYh_qBvDgg?e=S448xT Rota: https://uapt33090-my.sharepoint.com/:v:/g/personal/andref_ua_pt/Ebu4EZ6XcjZKrCLfBNyGWLIBzuGE7IVpyyEWl4Xg_uVBfQ?e=nGvcuz MobileNet:

https://user-images.githubusercontent.com/58526188/220404738-99e72e39-923d-4d42-a1b8-f2de20904e08.mp4

Inception:

https://user-images.githubusercontent.com/58526188/220404731-6e637a6e-5d62-49b3-adb2-d6f997895ef7.mp4

VGG: https://uapt33090-my.sharepoint.com/:v:/g/personal/andref_ua_pt/EQ0GrNZ466JDvVijSa6S4TMBJ6bDSKdPsHOvZqgY0cp2AQ?e=Bf1xmF ResNet:

https://user-images.githubusercontent.com/58526188/220404752-e576da59-e8ce-47a9-9aaa-8fed96210996.mp4

andrefdre commented 1 year ago

I added an option to visualize the network to easily debug and to have a better understanding. I also created a new network using LSTM, still haven't tried it. Here is the network architecture: Screenshot from 2023-02-24 14-32-32 I needed to install a new library to visualize the architecture. Torch already has an implementation of this with torch-summary, yet it currently doesn't work with RNN.

pip install torchinfo

To visualize the architecture, just add the parameter -sm when running training.

rosrun prometheus_driving model_train.py -d set10 -fn lstm -m 'LSTM()' -n_epochs 30 -batch_size 100 -loss_f 'MSELoss()' -nw 4 -lr_step_size 10 -c 0 -sm

As previously discussed with @manuelgitgomes we noticed that our test loss was lower than our training loss. I found this useful post: https://pyimagesearch.com/2019/10/14/why-is-my-validation-loss-lower-than-my-training-loss/

Some points taken from there that could lead to lower test loss:

Our testing dataset doesn't have augmentation;
We could be conservative with our hyper parameters, so maybe we could increase our leaning rate;
We could create deeper models;
We could reduce dropout;
The training loss is computed 1/2 epoch before testing loss;
Regularization(e.g dropout) is only applied during training and not during testing.

andrefdre commented 1 year ago

@callmesora could you confirm the following models to see if it is what you wanted? For the two first models I already did a diagram, so I will use those instead. The rest, since I'm not sure if they are correct I will just paste the console output of the model. Nvidia: Nvidia_Model drawio

Rota: Rota_Model drawio

MobileNetV2:

Inception: It's a bit big to fit in one screenshot, so it's easier to run the code with -sm as one of the parameters

MyVGG:

ResNet:

ResNetV1:

LSTM: Screenshot from 2023-02-24 14-32-32

callmesora commented 1 year ago

Hey! Sorry I've been away from my computer for a few weeks.

"@callmesora could you confirm the following models to see if it is what you wanted?"

I'm amazed, Idk how you did those diagrams but they look incredible. Good job! Regarding the architecture itself, they all look fine to me. Remember Deep Learning is an experimental science by its very nature, the architectures I added there were just some random ones based on popular models, the main idea being using a backbone for feature extraction and then a head to calculate the steering wheel angle. As long as they work in simulation they should be fine.

Last competition we found that most of the time *but now allways a lower loss resulted in better driving.

So if we have similar losses but one architecture is smaller we go with the smaller one.

I'm going to see if I can incorporate model tracking to your training loops

andrefdre commented 1 year ago

I was watching a YouTube video about the Google's new AI robot, and they mention that instead of using CNN, they used VIT in the model. After found this post https://viso.ai/deep-learning/vision-transformer-vit/ and thought about commenting here for curios people. I never heard about this, and it showed to have better results than CNN.

andrefdre commented 1 year ago

A pull request to merge this features into dev was created since this task was completed.

AutoMecUA / AutoMec-AD

Changes to CNN Arquitecture #165