Open Maryom opened 6 years ago
Hi,
Thanks for this repo.
I’m trying to run SparkBWA on Amazon EMR Yarn cluster, but I got many errors.
I wrote yarn instead of yarn-cluster and also I wrote the --deploy-mode cluster
yarn
yarn-cluster
--deploy-mode cluster
Then, I got the following error:
[hadoop@ip-172-31-14-100 ~]$ spark-submit --class com.github.sparkbwa.SparkBWA --master yarn --deploy-mode cluster --driver-memory 1500m --executor-memory 10g --executor-cores 1 --verbose --num-executors 16 sparkbwa-1.0.jar -m -r -p --index /Data/HumanBase/hg38 -n 16 -w "-R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589" ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589 Using properties file: /usr/lib/spark/conf/spark-defaults.conf Adding default property: spark.sql.warehouse.dir=*********(redacted) Adding default property: spark.executor.extraJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p' Adding default property: spark.history.fs.logDirectory=hdfs:///var/log/spark/apps Adding default property: spark.eventLog.enabled=true Adding default property: spark.shuffle.service.enabled=true Adding default property: spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native Adding default property: spark.yarn.historyServer.address=ip-172-31-14-100.eu-west-2.compute.internal:18080 Adding default property: spark.stage.attempt.ignoreOnDecommissionFetchFailure=true Adding default property: spark.driver.memory=11171M Adding default property: spark.executor.instances=16 Adding default property: spark.default.parallelism=256 Adding default property: spark.resourceManager.cleanupExpiredHost=true Adding default property: spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS=$(hostname -f) Adding default property: spark.driver.extraJavaOptions=-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p' Adding default property: spark.master=yarn Adding default property: spark.blacklist.decommissioning.timeout=1h Adding default property: spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native Adding default property: spark.sql.hive.metastore.sharedPrefixes=com.amazonaws.services.dynamodbv2 Adding default property: spark.executor.memory=10356M Adding default property: spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar Adding default property: spark.eventLog.dir=hdfs:///var/log/spark/apps Adding default property: spark.dynamicAllocation.enabled=true Adding default property: spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar Adding default property: spark.executor.cores=8 Adding default property: spark.history.ui.port=18080 Adding default property: spark.blacklist.decommissioning.enabled=true Adding default property: spark.decommissioning.timeout.threshold=20 Adding default property: spark.hadoop.yarn.timeline-service.enabled=false Parsed arguments: master yarn deployMode cluster executorMemory 10g executorCores 1 totalExecutorCores null propertiesFile /usr/lib/spark/conf/spark-defaults.conf driverMemory 1500m driverCores null driverExtraClassPath /usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar driverExtraLibraryPath /usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native driverExtraJavaOptions -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p' supervise false queue null numExecutors 16 files null pyFiles null archives null mainClass com.github.sparkbwa.SparkBWA primaryResource file:/home/hadoop/sparkbwa-1.0.jar name com.github.sparkbwa.SparkBWA childArgs [-m -r -p --index /Data/HumanBase/hg38 -n 16 -w -R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589 ERR000589_1.filt.fastq ERR000589_2.filt.fastq Output_ERR000589] jars null packages null packagesExclusions null repositories null verbose true Spark properties used, including those specified through --conf and those from the properties file /usr/lib/spark/conf/spark-defaults.conf: (spark.blacklist.decommissioning.timeout,1h) (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native) (spark.default.parallelism,256) (spark.blacklist.decommissioning.enabled,true) (spark.hadoop.yarn.timeline-service.enabled,false) (spark.driver.memory,1500m) (spark.executor.memory,10356M) (spark.executor.instances,16) (spark.sql.warehouse.dir,*********(redacted)) (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native) (spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080) (spark.eventLog.enabled,true) (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true) (spark.history.ui.port,18080) (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f)) (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p') (spark.resourceManager.cleanupExpiredHost,true) (spark.shuffle.service.enabled,true) (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps) (spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p') (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar) (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2) (spark.eventLog.dir,hdfs:///var/log/spark/apps) (spark.master,yarn) (spark.dynamicAllocation.enabled,true) (spark.executor.cores,8) (spark.decommissioning.timeout.threshold,20) (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar) Main class: org.apache.spark.deploy.yarn.Client Arguments: --jar file:/home/hadoop/sparkbwa-1.0.jar --class com.github.sparkbwa.SparkBWA --arg -m --arg -r --arg -p --arg --index --arg /Data/HumanBase/hg38 --arg -n --arg 16 --arg -w --arg -R @RG\tID:foo\tLB:bar\tPL:illumina\tPU:illumina\tSM:ERR000589 --arg ERR000589_1.filt.fastq --arg ERR000589_2.filt.fastq --arg Output_ERR000589 System properties: (spark.blacklist.decommissioning.timeout,1h) (spark.executor.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native) (spark.default.parallelism,256) (spark.blacklist.decommissioning.enabled,true) (spark.hadoop.yarn.timeline-service.enabled,false) (spark.driver.memory,1500m) (spark.executor.memory,10g) (spark.executor.instances,16) (spark.driver.extraLibraryPath,/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native) (spark.sql.warehouse.dir,*********(redacted)) (spark.yarn.historyServer.address,ip-172-31-14-100.eu-west-2.compute.internal:18080) (spark.eventLog.enabled,true) (spark.stage.attempt.ignoreOnDecommissionFetchFailure,true) (spark.history.ui.port,18080) (spark.yarn.appMasterEnv.SPARK_PUBLIC_DNS,$(hostname -f)) (SPARK_SUBMIT,true) (spark.executor.extraJavaOptions,-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p') (spark.app.name,com.github.sparkbwa.SparkBWA) (spark.resourceManager.cleanupExpiredHost,true) (spark.history.fs.logDirectory,hdfs:///var/log/spark/apps) (spark.shuffle.service.enabled,true) (spark.driver.extraJavaOptions,-XX:+UseConcMarkSweepGC -XX:CMSInitiatingOccupancyFraction=70 -XX:MaxHeapFreeRatio=70 -XX:+CMSClassUnloadingEnabled -XX:OnOutOfMemoryError='kill -9 %p') (spark.submit.deployMode,cluster) (spark.executor.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar) (spark.eventLog.dir,hdfs:///var/log/spark/apps) (spark.sql.hive.metastore.sharedPrefixes,com.amazonaws.services.dynamodbv2) (spark.master,yarn) (spark.dynamicAllocation.enabled,true) (spark.decommissioning.timeout.threshold,20) (spark.executor.cores,1) (spark.driver.extraClassPath,/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar) Classpath elements: file:/home/hadoop/sparkbwa-1.0.jar 18/01/20 15:53:12 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 18/01/20 15:53:13 INFO RMProxy: Connecting to ResourceManager at ip-172-31-14-100.eu-west-2.compute.internal/172.31.14.100:8032 18/01/20 15:53:13 INFO Client: Requesting a new application from cluster with 16 NodeManagers 18/01/20 15:53:13 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (12288 MB per container) 18/01/20 15:53:13 INFO Client: Will allocate AM container, with 1884 MB memory including 384 MB overhead 18/01/20 15:53:13 INFO Client: Setting up container launch context for our AM 18/01/20 15:53:13 INFO Client: Setting up the launch environment for our AM container 18/01/20 15:53:13 INFO Client: Preparing resources for our AM container 18/01/20 15:53:14 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME. 18/01/20 15:53:16 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_libs__3181673287761365885.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_libs__3181673287761365885.zip 18/01/20 15:53:17 INFO Client: Uploading resource file:/home/hadoop/sparkbwa-1.0.jar -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/sparkbwa-1.0.jar 18/01/20 15:53:17 INFO Client: Uploading resource file:/mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a/__spark_conf__4991143839440201874.zip -> hdfs://ip-172-31-14-100.eu-west-2.compute.internal:8020/user/hadoop/.sparkStaging/application_1516463115359_0001/__spark_conf__.zip 18/01/20 15:53:17 INFO SecurityManager: Changing view acls to: hadoop 18/01/20 15:53:17 INFO SecurityManager: Changing modify acls to: hadoop 18/01/20 15:53:17 INFO SecurityManager: Changing view acls groups to: 18/01/20 15:53:17 INFO SecurityManager: Changing modify acls groups to: 18/01/20 15:53:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hadoop); groups with view permissions: Set(); users with modify permissions: Set(hadoop); groups with modify permissions: Set() 18/01/20 15:53:17 INFO Client: Submitting application application_1516463115359_0001 to ResourceManager 18/01/20 15:53:18 INFO YarnClientImpl: Submitted application application_1516463115359_0001 18/01/20 15:53:19 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:19 INFO Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1516463597765 final status: UNDEFINED tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:20888/proxy/application_1516463115359_0001/ user: hadoop 18/01/20 15:53:20 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:21 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:22 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:23 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:24 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:25 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:26 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:27 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:28 INFO Client: Application report for application_1516463115359_0001 (state: ACCEPTED) 18/01/20 15:53:29 INFO Client: Application report for application_1516463115359_0001 (state: FAILED) 18/01/20 15:53:29 INFO Client: client token: N/A diagnostics: Application application_1516463115359_0001 failed 2 times due to AM Container for appattempt_1516463115359_0001_000002 exited with exitCode: 1 For more detailed output, check application tracking page:http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001Then, click on links to logs of each attempt. Diagnostics: Exception from container-launch. Container id: container_1516463115359_0001_02_000001 Exit code: 1 Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:582) at org.apache.hadoop.util.Shell.run(Shell.java:479) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:773) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:212) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Container exited with a non-zero exit code 1 Failing this attempt. Failing the application. ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: default start time: 1516463597765 final status: FAILED tracking URL: http://ip-172-31-14-100.eu-west-2.compute.internal:8088/cluster/app/application_1516463115359_0001 user: hadoop Exception in thread "main" org.apache.spark.SparkException: Application application_1516463115359_0001 finished with failed status at org.apache.spark.deploy.yarn.Client.run(Client.scala:1122) at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1168) at org.apache.spark.deploy.yarn.Client.main(Client.scala) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:775) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 18/01/20 15:53:29 INFO ShutdownHookManager: Shutdown hook called 18/01/20 15:53:29 INFO ShutdownHookManager: Deleting directory /mnt/tmp/spark-8adea679-22d7-4945-9708-d61ef96b2c2a [hadoop@ip-172-31-14-100 ~]$ Broadcast message from root@ip-172-31-14-100 (unknown) at 15:54 ... The system is going down for power off NOW! Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed by remote host. Connection to ec2-35-177-163-135.eu-west-2.compute.amazonaws.com closed.
Any help will be appropriated
Thank you 🙏
any word on this ?
Hi,
Thanks for this repo.
I’m trying to run SparkBWA on Amazon EMR Yarn cluster, but I got many errors.
I wrote
yarn
instead ofyarn-cluster
and also I wrote the--deploy-mode cluster
Then, I got the following error:
Any help will be appropriated
Thank you 🙏