Spark library for generalized K-Means clustering. Supports general Bregman divergences. Suitable for clustering probabilistic data, time series data, high dimensional data, and very large data.
Hi,
When 'predicting' a single Vector from a RDD[Vector] on a trained model a stackoverflowerror is thrown.
When doing the same on a RDD[Vector] at once it works oke.
println("clustering single vectors fails")
val singleVector = mymatrix.map { point =>
try {
val prediction = kModel.predict(point)
(point.toString, prediction)
} catch {
case e: Error => println("unable to predict a single vector")
}
}
println(s"singleVector.count():${singleVector.count()}")
println("clustering using multiple vectors, this runs oke")
val predictions = kModel.predict(mymatrix)
val multipleVector = predictions.zip(mymatrix).map(point => (point._2.toString, point._1))
println(s"multipleVector.count():${multipleVector.count()}")
2015/06/18 11:10:03:300 [ERROR] [Executor task launch worker-5] org.apache.spark.Logging$class.logError:96 - Exception in task 0.0 in stage 63.0 (TID 31500)
java.lang.StackOverflowError
at com.massivedatascience.divergence.SquaredEuclideanDistanceDivergence$.convexHomogeneous (BregmanDivergence.scala:144)
at com.massivedatascience.clusterer.NonSmoothedPointCenterFactory$class.toPoint(BregmanPointO ps.scala:209)
at com.massivedatascience.clusterer.SquaredEuclideanPointOps$.toPoint(BregmanPointOps.scala:260)
at com.massivedatascience.clusterer.KMeansPredictor$class.predictWeighted(KMeansModel.scala:66)
at com.massivedatascience.clusterer.KMeansModel.predictWeighted(KMeansModel.scala:99)
This works on the MLLib kmeans implementation, however switching to massive-kmeans gives the following stackoverflowerror:
(you can switch between import statements MLLib/massivedatascience in the scala file to see the difference)
Hi, When 'predicting' a single Vector from a RDD[Vector] on a trained model a stackoverflowerror is thrown. When doing the same on a RDD[Vector] at once it works oke.
I've put my code with data as an example here: https://github.com/bkersbergen/massive-kmeans-overflow.
This works on the MLLib kmeans implementation, however switching to massive-kmeans gives the following stackoverflowerror: (you can switch between import statements MLLib/massivedatascience in the scala file to see the difference)