databricks / spark-perf

Performance tests for Apache Spark
Apache License 2.0
379 stars 203 forks source link

Memory Problems with Spark-perf #99

Open yodha12 opened 8 years ago

yodha12 commented 8 years ago

I am running pyspark tests on cluster with 12 node, 20 cores on each nodes and 60gb memory per node. I am getting output of first few tests(sort, agg, count etc), and when it reaches to broadcast, job terminates. I assume it is because of lack of memory from .err file in result folder as ensureFreeSpace(4194304) called with curMem=610484012, maxMem=611642769. How can I increase maxMem value? This is my config/config.py file content.

COMMON_JAVA_OPTS = [

Fraction of JVM memory used for caching RDDs.

JavaOptionSet("spark.storage.memoryFraction", [0.66]),
JavaOptionSet("spark.serializer", ["org.apache.spark.serializer.JavaSerializer"]),
JavaOptionSet("spark.executor.memory", ["9g"]),

and

Set driver memory here

SPARK_DRIVER_MEMORY = "20g"

It shows the running command as follows.

Setting env var SPARK_SUBMIT_OPTS: -Dspark.storage.memoryFraction=0.66 -Dspark.serializer=org.apache.spark.serializer.JavaSerializer -Dspark.executor.memory=9g -Dspark.locality.wait=60000000 -Dsparkperf.commitSHA=unknown Running command: /nfs/15/soottikkal/local/spark-1.5.2-bin-hadoop2.6//bin/spark-submit --master spark://r0111.ten.osc.edu:7077 pyspark-tests/core_tests.py BroadcastWithBytes --num-trials=10 --inter-trial-wait=3 --num-partitions=400 --reduce-tasks=400 --random-seed=5 --persistent-type=memory --num-records=200000000 --unique-keys=20000 --key-length=10 --unique-values=1000000 --value-length=10 --broadcast-size=209715200 1>> results/python_perf_output2016-01-28_23-35-54_logs/python-broadcast-w-bytes.out 2>> results/python_perf_output2016-01-28_23-35-54_logs/python-broadcast-w-bytes.err

Is the spark-submit command taking memory as set in config.py here? maxMem is only 611mb which looks like 0.66*1gb of default memory setting of Spark. Changing spark.executor.memory or SPARK_DRIVER_MEMORY value in config/config.py has no effect on maxMem, but changing spark.storage.memoryFraction from 0.66 to 0.88 increases the MaxMem. How can I control maxMem value to get large memories that are already available in the cluster?