Closed yodha12 closed 8 years ago
I just started using Spark-perf-master and I am running pyspark tests only. After the run it prints output in the result folder. But I don't clearly understand what those numbers means. For example,
python-scheduling-throughput, SchedulerThroughputTest --num-tasks=5000 --num-trials=10 --inter-trial-wait=3, 2.505, 0.145, 2.383, 2.789, 2.460
python-agg-by-key, AggregateByKey --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 , 28.7235, 0.203, 28.461, 29.106, 28.537
What doest it mean by numbers 2.505, 0.145 etc for the first pyspark job and 28.7235, 0.03 etc for the second job.
See https://github.com/databricks/spark-perf/blob/79f8cfa6494e99a63f7cd4502aea4660b72ff6da/lib/sparkperf/utils.py#L41:
return (result_med, result_std, result_min, result_first, result_last)
I just started using Spark-perf-master and I am running pyspark tests only. After the run it prints output in the result folder. But I don't clearly understand what those numbers means. For example,
python-scheduling-throughput, SchedulerThroughputTest --num-tasks=5000 --num-trials=10 --inter-trial-wait=3, 2.505, 0.145, 2.383, 2.789, 2.460
python-agg-by-key, AggregateByKey --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 , 28.7235, 0.203, 28.461, 29.106, 28.537
What doest it mean by numbers 2.505, 0.145 etc for the first pyspark job and 28.7235, 0.03 etc for the second job.