databricks / spark-perf

Performance tests for Apache Spark
Apache License 2.0
379 stars 203 forks source link

scale_factor setting question #96

Open wenxuanguan opened 8 years ago

wenxuanguan commented 8 years ago

I`m confused about scale_factor setting in config.py.template

# The default values configured below are appropriate for approximately 20 m1.xlarge nodes,
# in which each node has 15 GB of memory. Use this variable to scale the values (e.g.
# number of records in a generated dataset) if you are running the tests with more
# or fewer nodes. When developing new test suites, you might want to set this to a small
# value suitable for a single machine, such as 0.001.
SCALE_FACTOR = 1.0

scale_factor=1 for 20 m1.xlarge nodes(15GB mem) , why 0.001 for a single machine? what if c3.xlarge(7.5GB mem) nodes or c3.2xlarge(4vCPU) nodes?

Thanks!

emres commented 8 years ago

I'm also confused about this. Something like a formula, based on CPU cores and RAM (per node) would make understanding and using easier I believe.