This is a performance testing framework for Apache Spark 1.0+.
For questions, bug reports, or feature requests, please open an issue on GitHub.
The spark-perf
scripts require Python 2.7+. If you're using an earlier version of Python, you may need to install the argparse
library using easy_install argparse
.
Support for automatically building Spark requires Maven. On spark-ec2
clusters, this can be installed using the ./bin/spark-ec2/install-maven
script from this project.
To configure spark-perf
, copy config/config.py.template
to config/config.py
and edit that file. See config.py.template
for detailed configuration instructions. After editing config.py
, execute ./bin/run
to run performance tests. You can pass the --config
option to use a custom configuration file.
The following sections describe some additional settings to change for certain test environments:
ssh localhost
works on your machine without a password.Set config.py options that are friendly for local execution:
SPARK_HOME_DIR = /path/to/your/spark
SPARK_CLUSTER_URL = "spark://%s:7077" % socket.gethostname()
SCALE_FACTOR = .05
SPARK_DRIVER_MEMORY = 512m
spark.executor.memory = 2g
SPARK_TESTS
entry.Set config.py options:
SPARK_HOME_DIR = /path/to/your/spark/install
SPARK_CLUSTER_URL = "spark://<your-master-hostname>:7077"
SCALE_FACTOR = <depends on your hardware>
SPARK_DRIVER_MEMORY = <depends on your hardware>
spark.executor.memory = <depends on your hardware>
SPARK_TESTS
entry.Set config.py options:
USE_CLUSTER_SPARK = False
SPARK_COMMIT_ID = <what you want test>
SCALE_FACTOR = <depends on your hardware>
SPARK_DRIVER_MEMORY = <depends on your hardware>
spark.executor.memory = <depends on your hardware>
SPARK_TESTS
entry.This project is licensed under the Apache 2.0 License. See LICENSE for full license text.