Output: a 3D tensor with the following dimensions: 0: patches, 1: width, 2: height, we only have one channel so we don't need another dimension for this
Using k-means (Luc + Tom)
Run k-means on these patches
Output: k-means centroids after i iterations
Using Restricted Boltzmann Machines or using Gaussian Processes (EM) (Robbert + Inez)
Feed patches into RBM/GP
Output: RBM hidden weights or GP joint probability (? not sure how to get the feature representations from a GP)
Common parts (Guido + rest)
Take the feature representations (either k-means centroids, RBM weights or GP stuff) which is of the form: K x W x H where K is the amount of centroids/hidden neurons that were used.
Convolve the input training set (extract all possible patches in a systematic way) (+ optionally whiten) and compute the distance between all these patches and the K feature representations. For the distance metric, see Coates' paper. Output is of the form: sqrt(P) x sqrt(P) x K where P is the amount of patches that were extracted and K is the amount of centroids. We can the values in this tensor the 'activations'.
Pooling: sum the activations in each quadrant (north-west, north-east, south-west, south-east). This results in a matrix: 4 x K, which we flatten into a vector of length 4K. This 4 x K is reached by summing out the patch index from the matrix. I.e. for each centroid, we get 4 values, for each quadrant.
The vector of length 4K is used (together with the image label) as input in the classifier, any classifier can be used. The scikit-learn SGDClassifier (or SGDRegressor, in our case) is very easy and fast but has some extra hyperparameters. As an alternative logistic regression can be used. Output of the classifier: a trained model. Model prediction output: a vector of length 121 with values indicating the probability of the image being that class.
Optionally: run Random Forest after this, not necessary though, may improve performance.
Normalize the probability distribution to sum up to 1 and each probability ranging from 0 to 1.
Convolutional neural networks (deep learning) (Steven + Guido if there is time)
I think the second common part in Coates' method costs the most time to implement. However, if the first common part is done, the k-means group and RBM/GP group can continue working, the second common part can be done last.
I will have my thesis repo cleaned by tomorrow, probably tonight.
We have decided to try the following methods:
I think the second common part in Coates' method costs the most time to implement. However, if the first common part is done, the k-means group and RBM/GP group can continue working, the second common part can be done last.
I will have my thesis repo cleaned by tomorrow, probably tonight.