databricks / spark-perf

Performance tests for Apache Spark
Apache License 2.0
379 stars 203 forks source link

Interpreting Spark-perf results #97

Closed yodha12 closed 8 years ago

yodha12 commented 8 years ago

I just started using Spark-perf-master and I am running pyspark tests only. After the run it prints output in the result folder. But I don't clearly understand what those numbers means. For example,

python-scheduling-throughput, SchedulerThroughputTest --num-tasks=5000 --num-trials=10 --inter-trial-wait=3, 2.505, 0.145, 2.383, 2.789, 2.460

python-agg-by-key, AggregateByKey --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 , 28.7235, 0.203, 28.461, 29.106, 28.537

What doest it mean by numbers 2.505, 0.145 etc for the first pyspark job and 28.7235, 0.03 etc for the second job.

JoshRosen commented 8 years ago

See https://github.com/databricks/spark-perf/blob/79f8cfa6494e99a63f7cd4502aea4660b72ff6da/lib/sparkperf/utils.py#L41:

    return (result_med, result_std, result_min, result_first, result_last)