Closed justorez closed 9 months ago
I'm playing with spark-bench. I was able to run kmeans on spark2. I'm using HDP-2.6.5.0 which comes with Spark2 - 2.3.0.
This is my conf file:
spark-bench = {
spark-submit-config = [{
spark-args = {
master = "yarn" // FILL IN YOUR MASTER HERE
num-executors = 4
// executor-memory = "XXXXXXX" // FILL IN YOUR EXECUTOR MEMORY
}
conf = {
// Any configuration you need for your setup goes here, like:
"spark.executor.cores" = "4"
"spark.executor.memory" = "5g"
"spark.driver.memory" = "5g"
// "spark.dynamicAllocation.enabled" = "false"
}
workload-suites = [
{
descr = "kmeans Workloads"
benchmark-output = "console"
workloads = [
{
name = "kmeans"
input = "hdfs:///tmp/csv-vs-parquet/kmeans-data.csv"
}
]
}
]
}]
}
My configuration file:
Exception info:
org.apache.spark.mllib.clustering.KMeans
in Spark2.1:Kmeans class in Spark2.0 does not have this method!