ghamerly / fast-kmeans

Code to speed up k-means clustering. Originally at BaylorCS/baylorml.
http://cs.baylor.edu/~hamerly/software/kmeans.php
MIT License
52 stars 8 forks source link
acceleration bounds clustering geometric k-means unsupervised-learning

=============================== Fast K-means Clustering Toolkit


Version 0.1 (Sat May 17 17:41:11 CDT 2014)


WHAT: This software is a testbed for comparing variants of Lloyd's k-means clustering algorithm. It includes implementations of several algorithms that accelerate the algorithm by avoiding unnecessary distance calculations.


WHO: Greg Hamerly (hamerly@cs.baylor.edu, primary contact) and Jonathan Drake (drakej@hp.com).


HOW TO BUILD THE SOFTWARE: type "make" (and hope for the best)


HOW TO RUN THE SOFTWARE: The driver is designed to take commands from standard input, usually a file that's been redirected as input:

./kmeans < commands.txt

You can read the source to find all the possible commands, but here is a summary:

Note that when a set of centers is initialized, that same set of centers is used from then on (until a new initialization occurs). So running a clustering algorithm multiple times will use the same initialization each time.

Here is an example of a simple set of commands:

dataset smallDataset.txt
initialize 10 kpp

annulus
hamerly
adaptive
heap
elkan
sort
compare

CAVEATS:


REFERENCES:

Phillips, Steven J. "Acceleration of k-means and related clustering algorithms." In Algorithm Engineering and Experiments, pp. 166-177. Springer Berlin Heidelberg, 2002.

Elkan, Charles. "Using the triangle inequality to accelerate k-means." In ICML, vol. 3, pp. 147-153. 2003.

Hamerly, Greg. "Making k-means Even Faster." In SDM, pp. 130-140. 2010.

Drake, Jonathan, and Greg Hamerly. "Accelerated k-means with adaptive distance bounds." In 5th NIPS Workshop on Optimization for Machine Learning. 2012.

Drake, Jonathan. "Faster k-means clustering." MS thesis, 2013.

Hamerly, Greg, and Jonathan Drake. "Accelerating Lloyd's algorithm for k-means clustering." To appear in Partitional Clustering Algorithms, Springer, 2014.