Review CNN code - Githubissues

manuelgitgomes commented 2 years ago

Review CNN1 and CNN2 code.

manuelgitgomes commented 2 years ago

Try cnn in D3

manuelgitgomes commented 2 years ago

I need to train cnn for large amount of time

manuelgitgomes commented 2 years ago

Need to acquire a new large dataset

callmesora commented 2 years ago

CNN1

The code seems fine to me and organized, I mainly looked at the ML aspects of it. The yaml file and the integration with rosnode was a bit out of my scope so I'm not gonna comment on that.

Some aspects that need to be considered/re-written for clarity.

batchgeneration function takes input in the utils : "train_flag" and "batch size". Currently the code uses this variables correctly but their name is a bit miss leading.

On the train.py script we use: history = model.fit(batchGen(xTrain, yTrain, batch_xtrain, batch_ytrain, image_width, image_height), steps_per_epoch=steps_per_epoch, epochs=epochs, validation_data=batchGen(xVal, yVal, batch_xval, batch_yval, image_width, image_height), validation_steps=validation_steps)

The batch_xtrain should be batch_size_train batch_ytrain should be training_flag

This way the readability is clearer. Please review if my sugestion makes sence. It's possibly I miss-read something. To do so read utils.py and train.py

CNN2

I just skeemed a bit more trough this code but it seems to be a less refined version of the first one where Daniel tried to change the architecture a little bit for testing purposes.

I don't think its doing anything that the cnn1 isn't doing already but it's slightly less organized since it doesn't utilize an utils.py This model architecture In my opinion will fail because of the dropout layer he added. The prob was 0.8 which is way too high for industry standards. I would sugest cap it at 50% maximum

Possible Improvements and Next Stages

Different Model architectures:

To implement new model architectures I would sugest applying the following changes.

Increasing dropout rate (on the initial layers don't add dropout, in the middle layers add per example 0.2 and on the last one we can try to go as high as 0.5). This, however ONLY MAKES SENCE if we see the model is overfitting. I would need to look at the graphs of the training to take further conclusion
Better Image augmentations. Some advanced image augmentation techniques could be applied, taking inspiration from YOLOv4 we could apply Mosaic augmentation or even cut mix up. ( we have to google how to code it)
Change model scale. The papper used on this is a bit outdated and it doesn't have many paramenters. We could try to increase the model complexity and scale it up with more hidden layers and bigger ones and see the cost in performances. To do this we can take inspiration from EfficientNet papper. This papper explains a good way to scale neural networks.

Important and Next to dos

We first need to establish a working ground to compare different model architectures. For this they all need to be tested on the SAME DATASET with the same conditions.

We should create a global dataset (maybe you already have this). Shared by all with a lot of images (let's say minium 1000 but we need to further research on this).

Then Each model architecture should be trained in the same way. (Let's say we all train them 300 epochs) then compare them. This way we can take strong conclusions on which models preform better then others.

Pratical takeaways

[ ] Create common dataset that will be used to compare different Neural Networks (NN)
[ ] Create python function to choose between different model architectures example choose_architecture(4)

Create models with:

[ ] different dropout rates,
[ ] another image augmentation techinques,
[ ] scale the parameters (example: change 64 to 64x2 (litteraly double every parameter). (Might work might not, we have to run tests. There are lot of ways to scale a NN .

Sorry for the late response, Things got a little bad with my thesis and I didn't really have time to look at this

callmesora commented 2 years ago

Some things I forgot to mention. We can test batch normalization different Learning rates , different optimizers and the use of momentum

Here's a snipper of code from eurocar, PilotNet (very similar to ours if not the same for a similiar task. (I encorage anyone who has free time to try this architecture. next week I will be very busy https://github.com/marsauto/europilot/blob/master/scripts/04.PilotNet.ipynb

% define PilotNet model, with batch normalization included. def get_model(input_shape): model = Sequential([ Conv2D(24, kernel_size=(5,5), strides=(2,2), activation='relu', input_shape=input_shape), BatchNormalization(axis=1), Conv2D(36, kernel_size=(5,5), strides=(2,2), activation='relu'), BatchNormalization(axis=1), Conv2D(48, kernel_size=(5,5), strides=(2,2), activation='relu'), BatchNormalization(axis=1), Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu'), BatchNormalization(axis=1), Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu'), BatchNormalization(axis=1), Flatten(), Dense(100, activation='relu'), BatchNormalization(), Dense(50, activation='relu'), BatchNormalization(), Dense(10, activation='relu'), BatchNormalization(), Dense(1) ])

return model

model = get_model(input_shape) sgd = SGD(lr=1e-3, decay=1e-4, momentum=0.9, nesterov=True) model.compile(optimizer=sgd, loss="mse") model.summary()

Good luck 💯

callmesora commented 2 years ago

Hold up I remembered something, there is a thing as well called callbacks. I only used it once so I'm not too familiar, they might help with the overfit. I think they slow the learning rate when the model starts overfitting. We can look further in to that

For the learningrate I would recommend starting with e-5 or e-4 . Tesla's AI director also does it :)) , usually works, why? noone knows

Some inspiration sources: (This one uses attention modules , I think it preforms rather well but It would take a while to implement this) https://github.com/FangLintao/Self-Driving-Car

(This one is on tensorflow and it's very very similar to the one we are using) I'm very confident this is the way to go they even offer us a simple architecture and good augmentations https://github.com/Tinker-Twins/Robust_Behavioral_Cloning https://arxiv.org/ftp/arxiv/papers/2010/2010.04767.pdf

brnaguiar commented 2 years ago

Improved CNN1

Because the differences between the CNN2 and CNN1 were small, I picked up the CNN1 and did some changes:

Code Refactoring: I merged the CNN1 code and its utils file and created a Jupyter Notebook for them. With the Jupyter Notebook we can have a better sense of the data that we are manipulating and it provides a better experience for data visualization, feature selecting and so on.
I changed the default visualization library (matplotlib) for a better one (in my opinion), Plotly, that allows interaction with the plots and it makes easier to see the data that we are dealing with.
Data undersampling: For the the majority classes (classes of steering angles that appear very often / classes that we have much data) I think the current value (samplesPerBin) is too high (25000). I reduced to 200 in order have a better data balancing but this is a parameter that we need to choose from dataset to dataset.
Data oversampling: For the minority classes I duplicated some samples and created new images based on the original samples in order do balance even more. This, with the data undersampling, helped to decrease the cost/loss of the model but we need to explore better techniques to improve data oversampling... One technique that is popular in classification problems is SMOTE, we can try its version for regression problems (SmoteR) https://bit.ly/3KCkQRa .

Beyond the solutions presented above, we can also try better cost functions and performance metrics that penalizes more the less frequent classes for a better accuracy and it would be nice if we have a global repository for the datasets generated, for example, we generate a new dataset and that dataset could be stacked in a global dataset automatically.

brnaguiar commented 2 years ago

I made some tweaks to the data augmentation and to the CNN: I added motion blur, contrast changes, sheering, noise and random erasing / occlusion in order to prevent the CNN from overfitting; and I also changed the activation functions of the convolution layers to Relu. Those changes allowed the model to converge faster and the model accuracy improved substantially: Without Relu the test error was around ~0.15/0.20, with Relu the test error is now around ~0.06. The first (hidden) layer of the Full-Connected Layers has also Relu as the activation function, because it also helps the model to converge faster (the others don't!).

manuelgitgomes commented 2 years ago

New changes added, closing issue

AutoMecUA / AutoMec-AD

Review CNN code #130

CNN1

CNN2

Important and Next to dos

Pratical takeaways

Improved CNN1