erodola / DLAI-s2-2021

Teaching material for the course of Deep Learning and Applied AI, 2nd semester 2021, Sapienza University of Rome
35 stars 5 forks source link

Accompanying issue for notebook 7 (Uncertainty, regularization and the deep learning toolset) #16

Open noranta4 opened 3 years ago

noranta4 commented 3 years ago

Comment this issue with your answers to exercise A and B of notebook 7 "Uncertainty, regularization, and the deep learning toolset".

At the beginning of your response put a big A or B to signal the exercise you are answering to. Like this:

A

If you answer to both A and B please do it in the same comment and put at the beginning of each answer the big A or B.

fedeloper commented 3 years ago

A

  1. Can we mitigate the memorization effect augmenting the training dataset with some transformations? Absolutely yes, using Online Augmentation we can be sure that the model will see always different samples on each epoch. We can use for example: torchvision.transforms.RandomResizedCrop torchvision.transforms.RandomHorizontalFlip torchvision.transforms.RandomRotation torchvision.transforms.RandomErasing this will insert (based on parameters we can get different shape) black random shape in the image also, we can add some random noise or random blur
  2. What happen if we rise the dropout coefficient? And if we apply dropout also on the input layer? if we rise too much the coefficient the model is not going to generalize and lose too much expressive power. According to this and that reasonable values for dropout are 0.5-0.7 on fully connected and 0.1-0.2 on convolutional layers( and pooling layers).
  3. What is the performance of a model with both batch normalization and dropout, in which order you should place them? The right short answer is "it depends". In some cases, the BatchNormalization has a regularization effect on the NN, but we cannot parametrize and optimize his regularization effect and may be not enough. Of course, DropOut and BatchNormalization can be used together. Refering to value ranges that if wrote before, using both techniques lead us to choose lower value for DropOut. We must pay attention where to put batchNormalization and DropOut because "we would like to ensure that for any parameter values, the network always produces activations with the desired distribution" link in fact is not a good idea to put DropOut right before a function activation of right before BatchNormalization.
  4. Does the performance of MonteCarloDropoutLeNet increase if we rise the number of predictions from 20 to 50 or 100? The accuracy will grow but inference time will grow linearly on the number of predictions. We also note almost no difference between 50 and 100, but i spent the double off the time to run the 100 version. image
elisabal commented 3 years ago

B First I tried to test the net on the combination of two similar images with the same label (two dogs). I found that the network works very poorly in the classification of the second image. The results seem to indicate that the second image represents a deer. The two distributions are well separated from each other except for the third and fourth images. newplot newplot(1) newplot(2) Then I combined the image of a ship with the one of a flipped ship. In this case I found that the distribution of the first image has a grate overlap with the one of a car. newplot(3) newplot(4) newplot(5) I think these graphs are useful for understanding the reliability of our model predictions. They give us an idea of the width of the probability distribution and its overlap with the distributions of other possible predictions.