databricks / spark-sql-perf

Apache License 2.0
582 stars 406 forks source link

executor_per_core is fixed to 1 vCores in spark-sql-perf on EMR #205

Open Rastogii opened 3 years ago

Rastogii commented 3 years ago

I have installed spark-sql-perf using:

  1. sudo yum install -y gcc make flex bison byacc git
  2. cd /tmp/
  3. git clone https://github.com/databricks/tpcds-kit.git
  4. cd tpcds-kit/tools
  5. make OS=LINUX
  6. curl https://bintray.com/sbt/rpm/rpm | sudo tee /etc/yum.repos.d/bintray-sbt-rpm.repo
  7. sudo yum install sbt
  8. cd /home/hadoop/
  9. git clone https://github.com/databricks/spark-sql-perf
  10. mkdir -p /home/hadoop/.sbt/preloaded/org/spark-packages/sbt-spark-package_2.10_0.13/0.1.1/
  11. cd /home/hadoop/.sbt/preloaded/org/spark-packages/sbt-spark-package_2.10_0.13/0.1.1/
  12. wget https://repos.spark-packages.org/org/spark-packages/sbt-spark-package/0.1.1/sbt-spark-package-0.1.1.pom
  13. wget https://repos.spark-packages.org/org/spark-packages/sbt-spark-package/0.1.1/sbt-spark-package-0.1.1.jar
  14. cd /home/hadoop/spark-sql-perf
  15. sbt +package

and spark configurations are set at /usr/lib/spark/conf/spark-defaults.conf where spark.executor.memory =19650M & spark.executor.cores = 5 & spark.executor.memoryOverhead =2184

In another case, I tried to set executor-per-cores at run-time using --executor-cores along with spark-submit...

Yet, in the YARN UI , I see this:

_** Container State: COMPLETE Mon Jun 21 06:12:54 +0000 2021 Elapsed Time: 7mins, 16sec Resource: 21856 Memory, 1 VCores

**_ And, there are 5 executors on each node, when there are 32 vCores.