k-means is ready, except that saving doesn't work really well. @Guardianofnature and @Lucivius have implemented minibatch k-means because the working memory is a lot less. The iteration parameter doesn't do anything, so it has only done one iteration. It takes about half an hour now, thats too much. 3000 centroids, so that's quite much, try a bit less (for galaxy this was OK, but the variance is a lot less for this data set, it's black-and-white and the features look more alike). It uses about 300 MB memory now, so making the batches a bit bigger is still doable.
The Bolzmann is working. @robbertvdg tested it on a part of the data, and it works. It returns the feature vector for each patch in the training set. We still need to look at the performance of just the training set or training + test set. Probably the performance is better for the latter, but look at the running time.
Preprocessing is now doable with normal memory. It processes the images one-by-one instead of all at the same time. It now takes about half a minute instead of a minute. It's all ready now. Here is a summary of what @Rahazan did.
Nice progress bar.
Steven has done code reviews, but didn't continue with the convolutional neural network this week.
We have had a meeting with the coach. Discussed: we are on track, convolutional neural networks usually does not work well for that many categories and 30.000 samples of data, so expectation of coach was that unsupervised works better. Idea: combine convolutional neural networks (to cluster the 5 roots of categories) and Coates (the rest). If this works, it's relatively new and perhaps we can write a paper about this.
LittleCreatures is gaining on us, they are 157th and we are down to 138th. FilterFeeders is 670th.
To do before Wednesday 25/2/2015:
[x] @Guardianofnature @Lucivius k-means make something to calculate distance between samples and centroids, for feature vector. (Make feature vector per patch first, the rest is general method for both k-means and RBM)
[x] @robbertvdg @Moorkopsoesje For RBM the same thing, but we already have a feature vector for each patch, throw each patch through RBM, pull this to make general feature vector (Coates paper).
[x] @Moorkopsoesje How can we look at the weights?
[x] @Rahazan Convolutional extracting patches, but not randomized. Save in a logical way (2d-array).
Done:
To do before Wednesday 25/2/2015: