CODAIT / spark-bench

Benchmark Suite for Apache Spark
https://codait.github.io/spark-bench/
Apache License 2.0
239 stars 123 forks source link

How to get Input data for the Logistic Regression workload ? #183

Open Aalnafessah opened 5 years ago

Aalnafessah commented 5 years ago

Spark-Bench version (version number, tag, or git commit hash)

spark-bench-launch-2.1.1_0.2.2-RELEASE

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

Spark.2.2.1

Scala version on your cluster

Scala version 2.11.8 Spark Cluster: Spark Standalone ( 1 master and 2 slaves).

Description of your problem and any other relevant info

I am working on analyzing Spark performance using as many workloads that I can use. Till this moment, I am able to use the KMeans (data generation and workload). I want to use the Logistic Regression workload but I could not because there is No data generation for Logistic Regression. I hav tried to run the "legacy version" of spark bench, but i got some issues because my SPark version is 2.2.1 . IS there any way to get or generate the input data to the Logistic Regression workload?

Many thanks in advance for your help.