Closed manuelgitgomes closed 2 years ago
Try cnn in D3
I need to train cnn for large amount of time
Need to acquire a new large dataset
The code seems fine to me and organized, I mainly looked at the ML aspects of it. The yaml file and the integration with rosnode was a bit out of my scope so I'm not gonna comment on that.
Some aspects that need to be considered/re-written for clarity.
On the train.py script we use:
history = model.fit(batchGen(xTrain, yTrain, batch_xtrain, batch_ytrain, image_width, image_height), steps_per_epoch=steps_per_epoch, epochs=epochs, validation_data=batchGen(xVal, yVal, batch_xval, batch_yval, image_width, image_height), validation_steps=validation_steps)
The batch_xtrain should be batch_size_train batch_ytrain should be training_flag
This way the readability is clearer. Please review if my sugestion makes sence. It's possibly I miss-read something. To do so read utils.py and train.py
I just skeemed a bit more trough this code but it seems to be a less refined version of the first one where Daniel tried to change the architecture a little bit for testing purposes.
I don't think its doing anything that the cnn1 isn't doing already but it's slightly less organized since it doesn't utilize an utils.py This model architecture In my opinion will fail because of the dropout layer he added. The prob was 0.8 which is way too high for industry standards. I would sugest cap it at 50% maximum
Possible Improvements and Next Stages
To implement new model architectures I would sugest applying the following changes.
We first need to establish a working ground to compare different model architectures. For this they all need to be tested on the SAME DATASET with the same conditions.
We should create a global dataset (maybe you already have this). Shared by all with a lot of images (let's say minium 1000 but we need to further research on this).
Then Each model architecture should be trained in the same way. (Let's say we all train them 300 epochs) then compare them. This way we can take strong conclusions on which models preform better then others.
[ ] Create common dataset that will be used to compare different Neural Networks (NN)
[ ] Create python function to choose between different model architectures example
choose_architecture(4)
Create models with:
Sorry for the late response, Things got a little bad with my thesis and I didn't really have time to look at this
Some things I forgot to mention. We can test batch normalization different Learning rates , different optimizers and the use of momentum
Here's a snipper of code from eurocar, PilotNet (very similar to ours if not the same for a similiar task. (I encorage anyone who has free time to try this architecture. next week I will be very busy https://github.com/marsauto/europilot/blob/master/scripts/04.PilotNet.ipynb
% define PilotNet model, with batch normalization included. def get_model(input_shape): model = Sequential([ Conv2D(24, kernel_size=(5,5), strides=(2,2), activation='relu', input_shape=input_shape), BatchNormalization(axis=1), Conv2D(36, kernel_size=(5,5), strides=(2,2), activation='relu'), BatchNormalization(axis=1), Conv2D(48, kernel_size=(5,5), strides=(2,2), activation='relu'), BatchNormalization(axis=1), Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu'), BatchNormalization(axis=1), Conv2D(64, kernel_size=(3,3), strides=(1,1), activation='relu'), BatchNormalization(axis=1), Flatten(), Dense(100, activation='relu'), BatchNormalization(), Dense(50, activation='relu'), BatchNormalization(), Dense(10, activation='relu'), BatchNormalization(), Dense(1) ])
return model
model = get_model(input_shape) sgd = SGD(lr=1e-3, decay=1e-4, momentum=0.9, nesterov=True) model.compile(optimizer=sgd, loss="mse") model.summary()
Good luck 💯
Hold up I remembered something, there is a thing as well called callbacks. I only used it once so I'm not too familiar, they might help with the overfit. I think they slow the learning rate when the model starts overfitting. We can look further in to that
For the learningrate I would recommend starting with e-5 or e-4 . Tesla's AI director also does it :)) , usually works, why? noone knows
Some inspiration sources: (This one uses attention modules , I think it preforms rather well but It would take a while to implement this) https://github.com/FangLintao/Self-Driving-Car
(This one is on tensorflow and it's very very similar to the one we are using) I'm very confident this is the way to go they even offer us a simple architecture and good augmentations https://github.com/Tinker-Twins/Robust_Behavioral_Cloning https://arxiv.org/ftp/arxiv/papers/2010/2010.04767.pdf
Because the differences between the CNN2 and CNN1 were small, I picked up the CNN1 and did some changes:
Code Refactoring: I merged the CNN1 code and its utils file and created a Jupyter Notebook for them. With the Jupyter Notebook we can have a better sense of the data that we are manipulating and it provides a better experience for data visualization, feature selecting and so on.
I changed the default visualization library (matplotlib) for a better one (in my opinion), Plotly, that allows interaction with the plots and it makes easier to see the data that we are dealing with.
Data undersampling: For the the majority classes (classes of steering angles that appear very often / classes that we have much data) I think the current value (samplesPerBin
) is too high (25000). I reduced to 200 in order have a better data balancing but this is a parameter that we need to choose from dataset to dataset.
Data oversampling: For the minority classes I duplicated some samples and created new images based on the original samples in order do balance even more. This, with the data undersampling, helped to decrease the cost/loss of the model but we need to explore better techniques to improve data oversampling... One technique that is popular in classification problems is SMOTE, we can try its version for regression problems (SmoteR) https://bit.ly/3KCkQRa .
Beyond the solutions presented above, we can also try better cost functions and performance metrics that penalizes more the less frequent classes for a better accuracy and it would be nice if we have a global repository for the datasets generated, for example, we generate a new dataset and that dataset could be stacked in a global dataset automatically.
I made some tweaks to the data augmentation and to the CNN: I added motion blur, contrast changes, sheering, noise and random erasing / occlusion in order to prevent the CNN from overfitting; and I also changed the activation functions of the convolution layers to Relu. Those changes allowed the model to converge faster and the model accuracy improved substantially: Without Relu the test error was around ~0.15/0.20, with Relu the test error is now around ~0.06. The first (hidden) layer of the Full-Connected Layers has also Relu as the activation function, because it also helps the model to converge faster (the others don't!).
New changes added, closing issue
Review CNN1 and CNN2 code.