CODAIT / spark-bench

Benchmark Suite for Apache Spark
https://codait.github.io/spark-bench/
Apache License 2.0
237 stars 123 forks source link

How to generate the dataset for Logistic Regression test? #177

Open congxu2016 opened 6 years ago

congxu2016 commented 6 years ago

Spark-Bench version (version number, tag, or git commit hash)

spark-bench_2.3.0_0.4.0-RELEASE_99

Details of your cluster setup (Spark version, Standalone/Yarn/Local/Etc)

spark-2.2, standalone

Scala version on your cluster

2.12.2

Your exact configuration file (with system details anonymized for security)

Relevant stacktrace

Description of your problem and any other relevant info

There is a test for Logistic Regression, but there is no data generator for Logistic Regression. How can I generate or download the dataset for this test?

ecurtin commented 6 years ago

Hi @congxu2016, sorry for the delay in answering.

The data generator in the legacy branch will generate a dataset appropriate for the LogisticRegression workload: https://github.com/CODAIT/spark-bench/blob/legacy/LogisticRegression/bin/gen_data.sh

lovengulu commented 5 years ago

I also have hard time creating dataset for LogisticRegression. I tried using the 'gen_data' from legacy as suggested above. I'm getting: Exception in thread "main" java.io.FileNotFoundException: File file:/opt/spark-bench/LogisticRegression/target/LogisticRegressionApp-1.0.jar does not exist

Is there an alternative way to to create the dataset ? possibly with something from the current 'RELEASE_99' ??