citiususc / SparkBWA

SparkBWA is a new tool that exploits the capabilities of a Big Data technology as Apache Spark to boost the performance of one of the most widely adopted sequence aligner, the Burrows-Wheeler Aligner (BWA).
GNU General Public License v3.0
69 stars 26 forks source link

Can't run SparkBWA on Amazon EMR Yarn cluster #55

Open Maryom opened 6 years ago

Maryom commented 6 years ago

Hi,

Thanks for this repo.

I’m trying to run SparkBWA on Amazon EMR Yarn cluster, but I got many errors.

I wrote yarn instead of yarn-cluster and also I wrote the --deploy-mode cluster

Then, I got the following error:

[hadoop@ip-172-31-14-100 ~]$ spark-submit --class com.github.sparkbwa.SparkBWA --master yarn --deploy-mode cluster --driver-memory 1500m --executor-memory 10g --executor-cores 1 --verbose --num-executors 16 sparkbwa-1.0.jar -m -r -p --index /Data/HumanBase/hg38 -n 16 -w "-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589" ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589
Using properties file: /usr/lib/spark/conf/spark-defaults.conf
Adding default property: spark.sql.warehouse.dir=*********(redacted)
Adding default property: spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps
Adding default property: spark.eventLog.enabled=true
Adding default property: spark.shuffle.service.enabled=true
Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.yarn.historyServer.address=ip-172-31-14-100.eu-west-2.compute.internal:18080
Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true
Adding default property: spark.driver.memory=11171M
Adding default property: spark.executor.instances=16
Adding default property: spark.default.parallelism=256
Adding default property: spark.resourceManager.cleanupExpiredHost=true
Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f)
Adding default property: spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
Adding default property: spark.master=yarn
Adding default property: spark.blacklist.decommissioning.timeout=1h
Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2
Adding default property: spark.executor.memory=10356M
Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps
Adding default property: spark.dynamicAllocation.enabled=true
Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
Adding default property: spark.executor.cores=8
Adding default property: spark.history.ui.port=18080
Adding default property: spark.blacklist.decommissioning.enabled=true
Adding default property: spark.decommissioning.timeout.threshold=20
Adding default property: spark.hadoop.yarn.timeline-service.enabled=false
Parsed arguments:
  master                  yarn
  deployMode              cluster
  executorMemory          10g
  executorCores           1
  totalExecutorCores      null
  propertiesFile          /usr/lib/spark/conf/spark-defaults.conf
  driverMemory            1500m
  driverCores             null
  driverExtraClassPath    /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar
  driverExtraLibraryPath  /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native
  driverExtraJavaOptions  -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p'
  supervise               false
  queue                   null
  numExecutors            16
  files                   null
  pyFiles                 null
  archives                null
  mainClass               com.github.sparkbwa.SparkBWA
  primaryResource         file:/home/hadoop/sparkbwa-1.0.jar
  name                    com.github.sparkbwa.SparkBWA
  childArgs               [-m -r -p --index /Data/HumanBase/hg38 -n 16 -w -R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589 ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589]
  jars                    null
  packages                null
  packagesExclusions      null
  repositories            null
  verbose                 true

Spark properties used, including those specified through
 --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf:
  (spark.blacklist.decommissioning.timeout,1h)
  (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
  (spark.default.parallelism,256)
  (spark.blacklist.decommissioning.enabled,true)
  (spark.hadoop.yarn.timeline-service.enabled,false)
  (spark.driver.memory,1500m)
  (spark.executor.memory,10356M)
  (spark.executor.instances,16)
  (spark.sql.warehouse.dir,*********(redacted))
  (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
  (spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080)
  (spark.eventLog.enabled,true)
  (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
  (spark.history.ui.port,18080)
  (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
  (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
  (spark.resourceManager.cleanupExpiredHost,true)
  (spark.shuffle.service.enabled,true)
  (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
  (spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
  (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
  (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
  (spark.eventLog.dir,hdfs:///var/log/spark/apps)
  (spark.master,yarn)
  (spark.dynamicAllocation.enabled,true)
  (spark.executor.cores,8)
  (spark.decommissioning.timeout.threshold,20)
  (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)

Main class:
org.apache.spark.deploy.yarn.Client
Arguments:
--jar
file:/home/hadoop/sparkbwa-1.0.jar
--class
com.github.sparkbwa.SparkBWA
--arg
-m
--arg
-r
--arg
-p
--arg
--index
--arg
/Data/HumanBase/hg38
--arg
-n
--arg
16
--arg
-w
--arg
-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589
--arg
ERR000589_1.filt.fastq
--arg
ERR000589_2.filt.fastq
--arg
Output_ERR000589
System properties:
(spark.blacklist.decommissioning.timeout,1h)
(spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.default.parallelism,256)
(spark.blacklist.decommissioning.enabled,true)
(spark.hadoop.yarn.timeline-service.enabled,false)
(spark.driver.memory,1500m)
(spark.executor.memory,10g)
(spark.executor.instances,16)
(spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native)
(spark.sql.warehouse.dir,*********(redacted))
(spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080)
(spark.eventLog.enabled,true)
(spark.stage.attempt.ignoreOnDecommissionFetchFailure,true)
(spark.history.ui.port,18080)
(spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f))
(SPARK_SUBMIT,true)
(spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.app.name,com.github.sparkbwa.SparkBWA)
(spark.resourceManager.cleanupExpiredHost,true)
(spark.history.fs.logDirectory,hdfs:///var/log/spark/apps)
(spark.shuffle.service.enabled,true)
(spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p')
(spark.submit.deployMode,cluster)
(spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
(spark.eventLog.dir,hdfs:///var/log/spark/apps)
(spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2)
(spark.master,yarn)
(spark.dynamicAllocation.enabled,true)
(spark.decommissioning.timeout.threshold,20)
(spark.executor.cores,1)
(spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar)
Classpath elements:
file:/home/hadoop/sparkbwa-1.0.jar

18/01/20 15:53:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
18/01/20 15:53:13 INFO RMProxy: Connecting to ResourceManager at ip-172-31-14-100.eu-west-2.compute.internal/172.31.14.100:8032
18/01/20 15:53:13 INFO Client: Requesting a new application from cluster with 16 NodeManagers
18/01/20 15:53:13 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container)
18/01/20 15:53:13 INFO Client: Will allocate AM container, with 1884 MB memory including 384 MB overhead
18/01/20 15:53:13 INFO Client: Setting up container launch context for our AM
18/01/20 15:53:13 INFO Client: Setting up the launch environment for our AM container
18/01/20 15:53:13 INFO Client: Preparing resources for our AM container
18/01/20 15:53:14 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
18/01/20 15:53:16 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_libs__3181673287761365885.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_libs__3181673287761365885.zip
18/01/20 15:53:17 INFO Client: Uploading resource file:/home/hadoop/sparkbwa-1.0.jar -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/sparkbwa-1.0.jar
18/01/20 15:53:17 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_conf__4991143839440201874.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_conf__.zip
18/01/20 15:53:17 INFO SecurityManager: Changing view acls to: hadoop
18/01/20 15:53:17 INFO SecurityManager: Changing modify acls to: hadoop
18/01/20 15:53:17 INFO SecurityManager: Changing view acls groups to: 
18/01/20 15:53:17 INFO SecurityManager: Changing modify acls groups to: 
18/01/20 15:53:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(hadoop); groups with view permissions: Set(); users  with modify permissions: Set(hadoop); groups with modify permissions: Set()
18/01/20 15:53:17 INFO Client: Submitting application application_1516463115359_0001 to ResourceManager
18/01/20 15:53:18 INFO YarnClientImpl: Submitted application application_1516463115359_0001
18/01/20 15:53:19 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:19 INFO Client: 
     client token: N/A
     diagnostics: N/A
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1516463597765
     final status: UNDEFINED
     tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:20888/proxy/application_1516463115359_0001/
     user: hadoop
18/01/20 15:53:20 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:21 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:22 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:23 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:24 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:25 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:26 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:27 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:28 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED)
18/01/20 15:53:29 INFO Client: Application report for application_1516463115359_0001 (state: FAILED)
18/01/20 15:53:29 INFO Client: 
     client token: N/A
     diagnostics: Application application_1516463115359_0001 failed 2 times due to AM Container for appattempt_1516463115359_0001_000002 exited with  exitCode: 1
For more detailed output, check application tracking page:http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1516463115359_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1: 
    at org.apache.hadoop.util.Shell.runCommand(Shell.java:582)
    at org.apache.hadoop.util.Shell.run(Shell.java:479)
    at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773)
    at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
    at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
     ApplicationMaster host: N/A
     ApplicationMaster RPC port: -1
     queue: default
     start time: 1516463597765
     final status: FAILED
     tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001
     user: hadoop
Exception in thread "main" org.apache.spark.SparkException: Application application_1516463115359_0001 finished with failed status
    at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122)
    at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168)
    at org.apache.spark.deploy.yarn.Client.main(Client.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
18/01/20 15:53:29 INFO ShutdownHookManager: Shutdown hook called
18/01/20 15:53:29 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a
[hadoop@ip-172-31-14-100 ~]$ 
Broadcast message from root@ip-172-31-14-100
    (unknown) at 15:54 ...

The system is going down for power off NOW!
Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed by remote host.
Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed.

Any help will be appropriated

Thank you 🙏

malimohub commented 3 years ago

any word on this ?