3.1: K-means Clustering

3.1.1: Try to improve the performance of two persons training data ( disjunct ). Perform K- means clustering of each cipher individually for the training set, in order to represent the training data as a number of cluster centroids. Now perform the training of the k-NN using the centroids of these clusters. You can try with different cluster sizes and see the resulting performance.

3.1.2: Compare your KNN performance based on the raw training data and based on the cluster centroids of the training data. During the comparison you should also consider the run times of the algorithm. As the generation of clusters is based on random starting points cross-validation should be performed.

3.1.3: Perform K-means clustering on each cipher individually for the training data from the entire class (disjunct), or a large part of it, e.g. 30 persons. Represent the training data as a number of cluster centroids and compare performance, try multiple cluster sizes.

Nevethan / SM-Exercises

3.1: K-means Clustering #4