erodola / DLAI-s2-2021

Teaching material for the course of Deep Learning and Applied AI, 2nd semester 2021, Sapienza University of Rome
35 stars 5 forks source link

CNN priors #15

Open noranta4 opened 3 years ago

noranta4 commented 3 years ago

People have the priors!

As pointed out in the last lecture and lab session, recognizing the prior knowledge you have about your learning problem is very important and can make the difference between effectively solving it or struggling with huge models and poor results.

In the following we are going to list many learning problems, asking you whether the CNN priors apply or not apply (Translational equivariance, Compositionality, Locality, Self-similarity).

For each one of the subsequent learning problems choose one of the following:

  1. The priors apply and we can use standard CNNs.
  2. The priors apply but we need to define a new way of doing convolution on this kind of data.
  3. The priors do not apply.

Discuss the ones that made you think most in a couple of lines.

Problems:

  1. Classify handwritten digits (MNIST)
  2. Classify handwritten digits from a random permutation of the pixels in the input image (think about a random permutation of the 784 entries in a vector encoding a MNIST image)
  3. Classify noisy handwritten digits (think about MNIST with a random 10% of the pixels set to white)
  4. Evaluate a chess position from the perspective of the white player (think about a dataset composed of games played by masters in the form of (s, z), where s is a proper representation of a game state (the chessboard) and z encodes the outcome respect to white, e.g. z=+1,-1,0 in case of victory, defeat, draw).
  5. Classify rigid 3D objects like planes or tables (think about the ShapeNet dataset)
  6. Classify the vertices of a human mesh as belonging to the arms or not (think about the FAUST dataset).
  7. Predicting the political orientation of a user solely based on the network of its friends on a social.
  8. Classify the musical instruments in a song.
  9. Evaluate the price of an house in California (think about a dataset with features like longitude, latitude, number of rooms, close to the ocean,... like this one)
  10. Predicting interactions between proteins and other biomolecules solely based on structure.
  11. Evaluate the richness of a territory solely based on satellite images. 12 Predicting the science field of a paper solely based on the citation network (think about the CORA dataset)
lorentzDFR commented 3 years ago

I try to answer with my ideas even if they can be wrong or trivial: 1 3 i think there is no advantage in using CNN if our images are scrambled; even applying the SAME random permutation on all the training dataset and validation dataset, the problem is that all the images have lost the structural information so using CNN is not necessary the best procedure for classification. 1 1 1 1 3 i don’t know if there is some way to represent the network in order to take advantage of the CNN priors. So i would say no 2 i think it can be done by manipulating in some way the audio track, for example extracting the spectrogram of the audio signal (which is a visual representation of the frequency spectrum) from which the CNN can extract features of the elements composing the song. 3 1 1 yes we can do it as long as we know what kind of information (elements in the landscape) make the territory “rich” 2 i think it is possible, if we use the text present in the citations in order to extract features like the recurrency of some words or some groups of words, which can be used to classify the domain of the paper. In general i imagine that the text would be considered as an array of characters on which we can apply 1D convolution.

elisabal commented 3 years ago

I try to answer: 1) 1 2) 3 3) 1 4) 2 I think we can use CNN however I think it is not the best model. I think that it is hard to use some important priors that we have in chess. I think that CNN are adept at characterizing small, local objectives such as local piece movement. 5) 1 I think that in this case could be useful to implement also a model that ensures rotational invariance. 6) 1 7) 3 II have a hard time figuring out how to apply the priors to the dataset. 8) 2 9) 3 Also in this case I have a hard time figuring out for example what does it mean applying "traslational invariance" to this dataset. 10) 1 11) 1 12) 2

fabradimitra commented 3 years ago

1) -> 1 2) The number's shape would be randomly spread all over the image: the priors do not apply. 3) -> 1 4) -> 1 5) -> 1, as long as you can render the 3D object in a series of images that capture the different viewpoints of the object. 6) -> 1 7) I would represent each istance of the dataset using some graph. Each user would be root and would have closest friends and family as direct connection. In this case locality would mean that, in order to classify the political orientation of the user, you should take a look at the political orientation of the people closest to the user, however this statement is not always true. Hence -> 3. 8) -> 2 Although we could not be able to feed right away a CNN with these sort of data, the priors apply in this case: (i) (Traslational equivariance) The same musical note played by the same instrument does not depend on the position in the song (ii) (Compositionality) Recognizing the musical notes helps recongnizing melodies (iii) (Locality) Notes that form a melody are near to each other. (iv) (Self-similarity) Two equal melodies can be both recognized once one of them has been recognized. 9) -> 3 10) -> 1 11) -> 1 12) -> 2

LeonardoEmili commented 3 years ago

1. 1. 2. 3 As we discovered in a past notebook, applying a random permutation we lose fundamental priors, such as locality. 3. 1. 4. 1. 5. 1 It would work using convolution on a 3D space. 6. 2 I suppose in this case we would need a convolution that works with mesh data. 7. 3 We would lose fundamental CNN priors using a graph-based representation of the available data. 8. 1. 9. 3 In this case, I don’t see the relation between this kind of data and CNN priors, I would say they do not apply. 10. 1. 11. 1 Assuming the objective task is possible to achieve using satellite images, I think here priors apply. 12. 2.