AutoMecUA / AutoMec-AD

Autonomous RC car with the help of ROS Noetic and ML.
GNU General Public License v3.0
15 stars 2 forks source link

Review CNN code #130

Closed manuelgitgomes closed 2 years ago

manuelgitgomes commented 2 years ago

Review CNN1 and CNN2 code.

manuelgitgomes commented 2 years ago

Try cnn in D3

manuelgitgomes commented 2 years ago

I need to train cnn for large amount of time

manuelgitgomes commented 2 years ago

Need to acquire a new large dataset

callmesora commented 2 years ago

CNN1

The code seems fine to me and organized, I mainly looked at the ML aspects of it. The yaml file and the integration with rosnode was a bit out of my scope so I'm not gonna comment on that.

Some aspects that need to be considered/re-written for clarity.

On the train.py script we use: history = model.fit(batchGen(xTrain, yTrain, batch_xtrain, batch_ytrain, image_width, image_height), steps_per_epoch=steps_per_epoch, epochs=epochs, validation_data=batchGen(xVal, yVal, batch_xval, batch_yval, image_width, image_height), validation_steps=validation_steps)

The batch_xtrain should be batch_size_train batch_ytrain should be training_flag

This way the readability is clearer. Please review if my sugestion makes sence. It's possibly I miss-read something. To do so read utils.py and train.py

CNN2

I just skeemed a bit more trough this code but it seems to be a less refined version of the first one where Daniel tried to change the architecture a little bit for testing purposes.

I don't think its doing anything that the cnn1 isn't doing already but it's slightly less organized since it doesn't utilize an utils.py This model architecture In my opinion will fail because of the dropout layer he added. The prob was 0.8 which is way too high for industry standards. I would sugest cap it at 50% maximum

Possible Improvements and Next Stages

To implement new model architectures I would sugest applying the following changes.

Important and Next to dos

We first need to establish a working ground to compare different model architectures. For this they all need to be tested on the SAME DATASET with the same conditions.

We should create a global dataset (maybe you already have this). Shared by all with a lot of images (let's say minium 1000 but we need to further research on this).

Then Each model architecture should be trained in the same way. (Let's say we all train them 300 epochs) then compare them. This way we can take strong conclusions on which models preform better then others.

Pratical takeaways

Create models with:

Sorry for the late response, Things got a little bad with my thesis and I didn't really have time to look at this

callmesora commented 2 years ago

Some things I forgot to mention. We can test batch normalization different Learning rates , different optimizers and the use of momentum

Here's a snipper of code from eurocar, PilotNet (very similar to ours if not the same for a similiar task. (I encorage anyone who has free time to try this architecture. next week I will be very busy https://github.com/marsauto/europilot/blob/master/scripts/04.PilotNet.ipynb

% define PilotNet model, with batch normalization included. def get_model(input_shape): model = Sequential([ Conv2D(24, kernel_size=(5,5), strides=(2,2), activation='relu', input_shape=input_shape), BatchNormalization(axis=1), Conv2D(36, kernel_size=(5,5), strides=(2,2), activation='relu'), BatchNormalization(axis=1), Conv2D(48, kernel_size=(5,5), strides=(2,2), activation='relu'), BatchNormalization(axis=1), Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu'), BatchNormalization(axis=1), Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu'), BatchNormalization(axis=1), Flatten(), Dense(100, activation='relu'), BatchNormalization(), Dense(50, activation='relu'), BatchNormalization(), Dense(10, activation='relu'), BatchNormalization(), Dense(1) ])

return model

model = get_model(input_shape) sgd = SGD(lr=1e-3, decay=1e-4, momentum=0.9, nesterov=True) model.compile(optimizer=sgd, loss="mse") model.summary()

Good luck 💯

callmesora commented 2 years ago

Hold up I remembered something, there is a thing as well called callbacks. I only used it once so I'm not too familiar, they might help with the overfit. I think they slow the learning rate when the model starts overfitting. We can look further in to that

For the learningrate I would recommend starting with e-5 or e-4 . Tesla's AI director also does it :)) , usually works, why? noone knows

Some inspiration sources: (This one uses attention modules , I think it preforms rather well but It would take a while to implement this) https://github.com/FangLintao/Self-Driving-Car

(This one is on tensorflow and it's very very similar to the one we are using) I'm very confident this is the way to go they even offer us a simple architecture and good augmentations https://github.com/Tinker-Twins/Robust_Behavioral_Cloning https://arxiv.org/ftp/arxiv/papers/2010/2010.04767.pdf

brnaguiar commented 2 years ago

Improved CNN1

Because the differences between the CNN2 and CNN1 were small, I picked up the CNN1 and did some changes:

Beyond the solutions presented above, we can also try better cost functions and performance metrics that penalizes more the less frequent classes for a better accuracy and it would be nice if we have a global repository for the datasets generated, for example, we generate a new dataset and that dataset could be stacked in a global dataset automatically.

brnaguiar commented 2 years ago

I made some tweaks to the data augmentation and to the CNN: I added motion blur, contrast changes, sheering, noise and random erasing / occlusion in order to prevent the CNN from overfitting; and I also changed the activation functions of the convolution layers to Relu. Those changes allowed the model to converge faster and the model accuracy improved substantially: Without Relu the test error was around ~0.15/0.20, with Relu the test error is now around ~0.06. The first (hidden) layer of the Full-Connected Layers has also Relu as the activation function, because it also helps the model to converge faster (the others don't!).

manuelgitgomes commented 2 years ago

New changes added, closing issue