derrickburns / generalized-kmeans-clustering

Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
https://generalized-kmeans-clustering.massivedatascience.com/
Apache License 2.0
298 stars 50 forks source link

Fixed bugs in ColumnTrackingKMeans implementation. #55

Closed derrickburns closed 9 years ago

derrickburns commented 9 years ago

Split previous and recent assignment into two separate RDDs.

Set storage levels properly. Added names to RDDs for debugging.