broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.72k stars 594 forks source link

Container exited with a non-zero exit code 50 #3686

Closed Sun-shan closed 6 years ago

Sun-shan commented 7 years ago

Hi, when I test the gatk4 command, I encountered some issues, how can I fix it ?

bash-4.2$ ./gatk-launch PrintReadsSpark -I /gatk4/output.bam -O /gatk4/output_2.bam -- --sparkRunner SPARK --sparkMaster yarn-client Using GATK jar /opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar Running: spark-submit --master yarn-client --conf spark.driver.userClassPathFirst=true --conf spark.io.compression.codec=lzf --conf spark.driver.maxResultSize=0 --conf spark.executor.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 --conf spark.driver.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 --conf spark.kryoserializer.buffer.max=512m --conf spark.yarn.executor.memoryOverhead=600 /opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar PrintReadsSpark -I /gatk4/output.bam -O /gatk4/output_2.bam --sparkMaster yarn-client 14:19:09.870 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly 14:19:10.155 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar!/com/intel/gkl/native/libgkl_compression.so [October 11, 2017 2:19:10 PM CST] PrintReadsSpark --output /gatk4/output_2.bam --input /gatk4/output.bam --sparkMaster yarn-client --readValidationStringency SILENT --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --bamPartitionSize 0 --disableSequenceDictionaryValidation false --shardedOutput false --numReducers 0 --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false [October 11, 2017 2:19:10 PM CST] Executing as hdfs@mg on Linux 3.10.0-514.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14; Version: 4.beta.5-70-gdc3237e-SNAPSHOT 14:19:10.289 INFO PrintReadsSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 1 14:19:10.290 INFO PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 14:19:10.290 INFO PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false 14:19:10.290 INFO PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 14:19:10.290 INFO PrintReadsSpark - Deflater: IntelDeflater 14:19:10.290 INFO PrintReadsSpark - Inflater: IntelInflater 14:19:10.290 INFO PrintReadsSpark - GCS max retries/reopens: 20 14:19:10.290 INFO PrintReadsSpark - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 14:19:10.290 INFO PrintReadsSpark - Initializing engine 14:19:10.290 INFO PrintReadsSpark - Done initializing engine 17/10/11 14:19:10 INFO spark.SparkContext: Running Spark version 1.6.0 17/10/11 14:19:10 INFO spark.SecurityManager: Changing view acls to: hdfs 17/10/11 14:19:10 INFO spark.SecurityManager: Changing modify acls to: hdfs 17/10/11 14:19:10 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 17/10/11 14:19:10 INFO util.Utils: Successfully started service 'sparkDriver' on port 43567. 17/10/11 14:19:11 INFO slf4j.Slf4jLogger: Slf4jLogger started 17/10/11 14:19:11 INFO Remoting: Starting remoting 17/10/11 14:19:11 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriverActorSystem@10.131.101.159:45501] 17/10/11 14:19:11 INFO Remoting: Remoting now listens on addresses: [akka.tcp://sparkDriverActorSystem@10.131.101.159:45501] 17/10/11 14:19:11 INFO util.Utils: Successfully started service 'sparkDriverActorSystem' on port 45501. 17/10/11 14:19:11 INFO spark.SparkEnv: Registering MapOutputTracker 17/10/11 14:19:11 INFO spark.SparkEnv: Registering BlockManagerMaster 17/10/11 14:19:11 INFO storage.DiskBlockManager: Created local directory at /tmp/hdfs/blockmgr-3fe99005-cdde-437f-9ca5-cdc7b1b9c057 17/10/11 14:19:11 INFO storage.MemoryStore: MemoryStore started with capacity 530.0 MB 17/10/11 14:19:11 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/10/11 14:19:11 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/10/11 14:19:11 INFO ui.SparkUI: Started SparkUI at http://10.131.101.159:4040 17/10/11 14:19:11 INFO spark.SparkContext: Added JAR file:/opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar at spark://10.131.101.159:43567/jars/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar with timestamp 1507702751615 17/10/11 14:19:11 INFO client.RMProxy: Connecting to ResourceManager at mg/10.131.101.159:8032 17/10/11 14:19:11 INFO yarn.Client: Requesting a new application from cluster with 2 NodeManagers 17/10/11 14:19:12 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (164726 MB per container) 17/10/11 14:19:12 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 17/10/11 14:19:12 INFO yarn.Client: Setting up container launch context for our AM 17/10/11 14:19:12 INFO yarn.Client: Setting up the launch environment for our AM container 17/10/11 14:19:12 INFO yarn.Client: Preparing resources for our AM container 17/10/11 14:19:12 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.1-hadoop2 17/10/11 14:19:12 INFO yarn.Client: Uploading resource file:/tmp/hdfs/spark-8c88439f-dcb0-48b2-86f3-fc82cef4c438/spark_conf8945422067005652415.zip -> hdfs://mg:8020/user/hdfs/.sparkStaging/application_1507683879816_0006/spark_conf8945422067005652415.zip 17/10/11 14:19:13 INFO spark.SecurityManager: Changing view acls to: hdfs 17/10/11 14:19:13 INFO spark.SecurityManager: Changing modify acls to: hdfs 17/10/11 14:19:13 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); users with modify permissions: Set(hdfs) 17/10/11 14:19:13 INFO yarn.Client: Submitting application 6 to ResourceManager 17/10/11 14:19:13 INFO impl.YarnClientImpl: Submitted application application_1507683879816_0006 17/10/11 14:19:14 INFO yarn.Client: Application report for application_1507683879816_0006 (state: ACCEPTED) 17/10/11 14:19:14 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.hdfs start time: 1507702753100 final status: UNDEFINED tracking URL: http://mg:8088/proxy/application_1507683879816_0006/ user: hdfs 17/10/11 14:19:15 INFO yarn.Client: Application report for application_1507683879816_0006 (state: ACCEPTED) 17/10/11 14:19:15 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(null) 17/10/11 14:19:15 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> mg, PROXY_URI_BASES -> http://mg:8088/proxy/application_1507683879816_0006), /proxy/application_1507683879816_0006 17/10/11 14:19:15 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 17/10/11 14:19:16 INFO yarn.Client: Application report for application_1507683879816_0006 (state: ACCEPTED) 17/10/11 14:19:17 INFO yarn.Client: Application report for application_1507683879816_0006 (state: RUNNING) 17/10/11 14:19:17 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.131.101.159 ApplicationMaster RPC port: 0 queue: root.users.hdfs start time: 1507702753100 final status: UNDEFINED tracking URL: http://mg:8088/proxy/application_1507683879816_0006/ user: hdfs 17/10/11 14:19:17 INFO cluster.YarnClientSchedulerBackend: Application application_1507683879816_0006 has started running. 17/10/11 14:19:17 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 34044. 17/10/11 14:19:17 INFO netty.NettyBlockTransferService: Server created on 34044 17/10/11 14:19:17 INFO storage.BlockManager: external shuffle service port = 7337 17/10/11 14:19:17 INFO storage.BlockManagerMaster: Trying to register BlockManager 17/10/11 14:19:17 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.131.101.159:34044 with 530.0 MB RAM, BlockManagerId(driver, 10.131.101.159, 34044) 17/10/11 14:19:17 INFO storage.BlockManagerMaster: Registered BlockManager 17/10/11 14:19:17 INFO scheduler.EventLoggingListener: Logging events to hdfs://mg:8020/user/spark/applicationHistory/application_1507683879816_0006 17/10/11 14:19:17 INFO spark.SparkContext: Registered listener com.cloudera.spark.lineage.ClouderaNavigatorListener 17/10/11 14:19:17 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 17/10/11 14:19:17 INFO storage.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 285.6 KB, free 529.7 MB) 17/10/11 14:19:18 INFO storage.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.1 KB, free 529.7 MB) 17/10/11 14:19:18 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.131.101.159:34044 (size: 26.1 KB, free: 530.0 MB) 17/10/11 14:19:18 INFO spark.SparkContext: Created broadcast 0 from newAPIHadoopFile at ReadsSparkSource.java:112 17/10/11 14:19:18 INFO storage.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 14.5 KB, free 529.7 MB) 17/10/11 14:19:18 INFO storage.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.1 KB, free 529.7 MB) 17/10/11 14:19:18 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.131.101.159:34044 (size: 2.1 KB, free: 530.0 MB) 17/10/11 14:19:18 INFO spark.SparkContext: Created broadcast 1 from broadcast at ReadsSparkSink.java:195 17/10/11 14:19:18 INFO Configuration.deprecation: mapred.output.dir is deprecated. Instead, use mapreduce.output.fileoutputformat.outputdir 17/10/11 14:19:18 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/10/11 14:19:18 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 17/10/11 14:19:18 INFO spark.SparkContext: Starting job: saveAsNewAPIHadoopFile at ReadsSparkSink.java:203 17/10/11 14:19:18 INFO input.FileInputFormat: Total input paths to process : 1 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Registering RDD 5 (mapToPair at SparkUtils.java:157) 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Got job 0 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:203) with 1 output partitions 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:203) 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at mapToPair at SparkUtils.java:157), which has no missing parents 17/10/11 14:19:18 INFO storage.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 15.2 KB, free 529.7 MB) 17/10/11 14:19:18 INFO storage.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 6.9 KB, free 529.7 MB) 17/10/11 14:19:18 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.131.101.159:34044 (size: 6.9 KB, free: 530.0 MB) 17/10/11 14:19:18 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1004 17/10/11 14:19:18 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at mapToPair at SparkUtils.java:157) (first 15 tasks are for partitions Vector(0)) 17/10/11 14:19:18 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks 17/10/11 14:19:19 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 17/10/11 14:19:23 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (com2:35572) with ID 1 17/10/11 14:19:23 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1) 17/10/11 14:19:23 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, com2, executor 1, partition 0, NODE_LOCAL, 2235 bytes) 17/10/11 14:19:23 INFO storage.BlockManagerMasterEndpoint: Registering block manager com2:38568 with 530.0 MB RAM, BlockManagerId(1, com2, 38568) 17/10/11 14:19:25 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on com2:38568 (size: 6.9 KB, free: 530.0 MB) 17/10/11 14:19:26 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on com2:38568 (size: 26.1 KB, free: 530.0 MB) 17/10/11 14:19:27 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4180 ms on com2 (executor 1) (1/1) 17/10/11 14:19:27 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/10/11 14:19:27 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (mapToPair at SparkUtils.java:157) finished in 8.951 s 17/10/11 14:19:27 INFO scheduler.DAGScheduler: looking for newly runnable stages 17/10/11 14:19:27 INFO scheduler.DAGScheduler: running: Set() 17/10/11 14:19:27 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 17/10/11 14:19:27 INFO scheduler.DAGScheduler: failed: Set() 17/10/11 14:19:27 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[9] at mapToPair at ReadsSparkSink.java:244), which has no missing parents 17/10/11 14:19:27 INFO storage.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 86.1 KB, free 529.6 MB) 17/10/11 14:19:27 INFO storage.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 32.3 KB, free 529.6 MB) 17/10/11 14:19:27 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.131.101.159:34044 (size: 32.3 KB, free: 529.9 MB) 17/10/11 14:19:27 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1004 17/10/11 14:19:27 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[9] at mapToPair at ReadsSparkSink.java:244) (first 15 tasks are for partitions Vector(0)) 17/10/11 14:19:27 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 17/10/11 14:19:27 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, com2, executor 1, partition 0, NODE_LOCAL, 1990 bytes) 17/10/11 14:19:27 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on com2:38568 (size: 32.3 KB, free: 529.9 MB) 17/10/11 14:19:27 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to com2:35572 17/10/11 14:19:27 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 134 bytes 17/10/11 14:19:28 WARN scheduler.TaskSetManager: Lost task 0.0 in stage 1.0 (TID 1, com2, executor 1): java.lang.AbstractMethodError: org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink$$Lambda$26/353370312.call(Ljava/lang/Object;)Ljava/lang/Iterable; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:159) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:159) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

17/10/11 14:19:28 INFO scheduler.TaskSetManager: Starting task 0.1 in stage 1.0 (TID 2, com2, executor 1, partition 0, NODE_LOCAL, 1990 bytes) 17/10/11 14:19:28 INFO cluster.YarnClientSchedulerBackend: Disabling executor 1. 17/10/11 14:19:28 INFO scheduler.DAGScheduler: Executor lost: 1 (epoch 1) 17/10/11 14:19:28 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 17/10/11 14:19:28 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(1, com2, 38568) 17/10/11 14:19:28 INFO storage.BlockManagerMaster: Removed 1 successfully in removeExecutor 17/10/11 14:19:28 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1507683879816_0006_01_000002 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000002 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

17/10/11 14:19:28 ERROR cluster.YarnScheduler: Lost executor 1 on com2: Container marked as failed: container_1507683879816_0006_01_000002 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000002 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

17/10/11 14:19:28 WARN scheduler.TaskSetManager: Lost task 0.1 in stage 1.0 (TID 2, com2, executor 1): ExecutorLostFailure (executor 1 exited caused by one of the running tasks) Reason: Container marked as failed: container_1507683879816_0006_01_000002 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000002 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

17/10/11 14:19:28 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 1 from BlockManagerMaster. 17/10/11 14:19:28 INFO storage.BlockManagerMaster: Removal of executor 1 requested 17/10/11 14:19:28 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 1 17/10/11 14:19:28 INFO spark.ExecutorAllocationManager: Existing executor 1 has been removed (new total is 0) 17/10/11 14:19:35 INFO cluster.YarnClientSchedulerBackend: Registered executor NettyRpcEndpointRef(null) (com2:35590) with ID 2 17/10/11 14:19:35 INFO scheduler.TaskSetManager: Starting task 0.2 in stage 1.0 (TID 3, com2, executor 2, partition 0, NODE_LOCAL, 1990 bytes) 17/10/11 14:19:35 INFO spark.ExecutorAllocationManager: New executor 2 has registered (new total is 1) 17/10/11 14:19:35 INFO storage.BlockManagerMasterEndpoint: Registering block manager com2:46254 with 530.0 MB RAM, BlockManagerId(2, com2, 46254) 17/10/11 14:19:36 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on com2:46254 (size: 32.3 KB, free: 530.0 MB) 17/10/11 14:19:37 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to com2:35590 17/10/11 14:19:37 WARN scheduler.TaskSetManager: Lost task 0.2 in stage 1.0 (TID 3, com2, executor 2): java.lang.AbstractMethodError: org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink$$Lambda$14/1380582544.call(Ljava/lang/Object;)Ljava/lang/Iterable; at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:159) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:159) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$20.apply(RDD.scala:710) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) at org.apache.spark.rdd.RDD.iterator(RDD.scala:270) at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66) at org.apache.spark.scheduler.Task.run(Task.scala:89) at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:242) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

17/10/11 14:19:37 INFO scheduler.TaskSetManager: Starting task 0.3 in stage 1.0 (TID 4, com2, executor 2, partition 0, NODE_LOCAL, 1990 bytes) 17/10/11 14:19:38 INFO cluster.YarnClientSchedulerBackend: Disabling executor 2. 17/10/11 14:19:38 INFO scheduler.DAGScheduler: Executor lost: 2 (epoch 1) 17/10/11 14:19:38 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 17/10/11 14:19:38 INFO storage.BlockManagerMasterEndpoint: Removing block manager BlockManagerId(2, com2, 46254) 17/10/11 14:19:38 INFO storage.BlockManagerMaster: Removed 2 successfully in removeExecutor 17/10/11 14:19:38 ERROR cluster.YarnScheduler: Lost executor 2 on com2: Container marked as failed: container_1507683879816_0006_01_000003 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000003 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

17/10/11 14:19:38 WARN cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: Container marked as failed: container_1507683879816_0006_01_000003 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000003 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

17/10/11 14:19:38 WARN scheduler.TaskSetManager: Lost task 0.3 in stage 1.0 (TID 4, com2, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1507683879816_0006_01_000003 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000003 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

17/10/11 14:19:38 ERROR scheduler.TaskSetManager: Task 0 in stage 1.0 failed 4 times; aborting job 17/10/11 14:19:38 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 17/10/11 14:19:38 INFO storage.BlockManagerMasterEndpoint: Trying to remove executor 2 from BlockManagerMaster. 17/10/11 14:19:38 INFO storage.BlockManagerMaster: Removal of executor 2 requested 17/10/11 14:19:38 INFO cluster.YarnClientSchedulerBackend: Asked to remove non-existent executor 2 17/10/11 14:19:38 INFO cluster.YarnScheduler: Cancelling stage 1 17/10/11 14:19:38 INFO scheduler.DAGScheduler: ResultStage 1 (saveAsNewAPIHadoopFile at ReadsSparkSink.java:203) failed in 10.702 s due to Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, com2, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1507683879816_0006_01_000003 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000003 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

Driver stacktrace: 17/10/11 14:19:38 INFO spark.ExecutorAllocationManager: Existing executor 2 has been removed (new total is 0) 17/10/11 14:19:38 INFO scheduler.DAGScheduler: Job 0 failed: saveAsNewAPIHadoopFile at ReadsSparkSink.java:203, took 19.909238 s 17/10/11 14:19:38 INFO ui.SparkUI: Stopped Spark web UI at http://10.131.101.159:4040 17/10/11 14:19:38 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 17/10/11 14:19:38 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 17/10/11 14:19:38 INFO cluster.YarnClientSchedulerBackend: Asking each executor to shut down 17/10/11 14:19:38 INFO cluster.YarnClientSchedulerBackend: Stopped 17/10/11 14:19:38 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/10/11 14:19:38 INFO storage.MemoryStore: MemoryStore cleared 17/10/11 14:19:38 INFO storage.BlockManager: BlockManager stopped 17/10/11 14:19:38 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 17/10/11 14:19:38 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/10/11 14:19:38 INFO spark.SparkContext: Successfully stopped SparkContext 14:19:38.600 INFO PrintReadsSpark - Shutting down engine [October 11, 2017 2:19:38 PM CST] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark done. Elapsed time: 0.48 minutes. Runtime.totalMemory()=986185728 org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 1.0 failed 4 times, most recent failure: Lost task 0.3 in stage 1.0 (TID 4, com2, executor 2): ExecutorLostFailure (executor 2 exited caused by one of the running tasks) Reason: Container marked as failed: container_1507683879816_0006_01_000003 on host: com2. Exit status: 50. Diagnostics: Exception from container-launch. Container id: container_1507683879816_0006_01_000003 Exit code: 50 Stack trace: ExitCodeException exitCode=50: at org.apache.hadoop.util.Shell.runCommand(Shell.java:601) at org.apache.hadoop.util.Shell.run(Shell.java:504) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:786) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:213) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Container exited with a non-zero exit code 50

Driver stacktrace: at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1457) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1445) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1444) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1444) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:799) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:799) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1668) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1627) at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1616) at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:620) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1862) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1875) at org.apache.spark.SparkContext.runJob(SparkContext.scala:1952) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply$mcV$sp(PairRDDFunctions.scala:1144) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopDataset$1.apply(PairRDDFunctions.scala:1074) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopDataset(PairRDDFunctions.scala:1074) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply$mcV$sp(PairRDDFunctions.scala:994) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985) at org.apache.spark.rdd.PairRDDFunctions$$anonfun$saveAsNewAPIHadoopFile$2.apply(PairRDDFunctions.scala:985) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:150) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:111) at org.apache.spark.rdd.RDD.withScope(RDD.scala:316) at org.apache.spark.rdd.PairRDDFunctions.saveAsNewAPIHadoopFile(PairRDDFunctions.scala:985) at org.apache.spark.api.java.JavaPairRDD.saveAsNewAPIHadoopFile(JavaPairRDD.scala:800) at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.saveAsShardedHadoopFiles(ReadsSparkSink.java:203) at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReadsSingle(ReadsSparkSink.java:230) at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:153) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:259) at org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark.runTool(PrintReadsSpark.java:39) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:362) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:137) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:158) at org.broadinstitute.hellbender.Main.main(Main.java:239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:730) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:181) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:206) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:121) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) 17/10/11 14:19:38 INFO remote.RemoteActorRefProvider$RemotingTerminator: Shutting down remote daemon. 17/10/11 14:19:38 INFO util.ShutdownHookManager: Shutdown hook called 17/10/11 14:19:38 INFO remote.RemoteActorRefProvider$RemotingTerminator: Remote daemon shut down; proceeding with flushing remote transports. 17/10/11 14:19:38 INFO util.ShutdownHookManager: Deleting directory /tmp/hdfs/spark-8c88439f-dcb0-48b2-86f3-fc82cef4c438

Sun-shan commented 7 years ago

The input file : bash-4.2$ hdfs dfs -ls /gatk4 Found 2 items -rw-r--r-- 3 hdfs supergroup 62934 2017-10-11 13:38 /gatk4/output.bam drwxr-xr-x - hdfs supergroup 0 2017-10-11 14:19 /gatk4/output_2.bam.parts

The spark-submit:

bash-4.2$ spark-submit Usage: spark-submit [options] <app jar | python file> [app arguments] Usage: spark-submit --kill [submission ID] --master [spark://...] Usage: spark-submit --status [submission ID] --master [spark://...]

Options: --master MASTER_URL spark://host:port, mesos://host:port, yarn, or local. --deploy-mode DEPLOY_MODE Whether to launch the driver program locally ("client") or on one of the worker machines inside the cluster ("cluster") (Default: client). --class CLASS_NAME Your application's main class (for Java / Scala apps). --name NAME A name of your application. --jars JARS Comma-separated list of local jars to include on the driver and executor classpaths. .......

the spark-shell bash-4.2$ spark-shell Setting default log level to "WARN". To adjust logging level use sc.setLogLevel(newLevel). Failed to created SparkJLineReader: java.io.IOException: Permission denied Falling back to SimpleReader. Welcome to


 / __/__  ___ _____/ /__
_\ \/ _ \/ _ `/ __/  '_/

// ./_,// //_\ version 1.6.0 /_/

Using Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_91) Type in expressions to have them evaluated. Type :help for more information. Spark context available as sc (master = yarn-client, app id = application_1507683879816_0007). Wed Oct 11 14:25:24 CST 2017 Thread[main,5,main] java.io.FileNotFoundException: derby.log (Permission denied)

Wed Oct 11 14:25:24 CST 2017: Booting Derby version The Apache Software Foundation - Apache Derby - 10.11.1.1 - (1616546): instance a816c00e-015f-0a1b-f1bd-00002ce33928 on database directory /tmp/spark-98953d35-8594-4907-b4a5-0870f1d17b3e/metastore with class loader sun.misc.Launcher$AppClassLoader@5c647e05 Loaded from file:/opt/cloudera/parcels/CDH-5.12.1-1.cdh5.12.1.p0.3/jars/derby-10.11.1.1.jar java.vendor=Oracle Corporation java.runtime.version=1.8.0_91-b14 user.dir=/opt/Software/gatk os.name=Linux os.arch=amd64 os.version=3.10.0-514.el7.x86_64 derby.system.home=null Database Class Loader started - derby.database.classpath='' 17/10/11 14:25:33 WARN metastore.ObjectStore: Version information not found in metastore. hive.metastore.schema.verification is not enabled so recording the schema version 1.1.0-cdh5.12.1 17/10/11 14:25:33 WARN metastore.ObjectStore: Failed to get database default, returning NoSuchObjectException SQL context available as sqlContext.

./gradlew bundle [root@com1 gatk]# ./gradlew bundle when I executed the command ”./gradlew bundle”, it appeared the error in the last ,did this matter?

....... [loading ZipFileIndexFileObject[/root/.gradle/caches/modules-2/files-2.1/com.fasterxml.jackson.core/jackson-databind/2.6.5/d50be1723a09be903887099ff2014ea9020333/jackson-databind-2.6.5.jar(com/fasterxml/jackson/databind/annotation/JsonSerialize$Inclusion.class)]] [loading ZipFileIndexFileObject[/root/.gradle/caches/modules-2/files-2.1/org.apache.logging.log4j/log4j-core/2.5/7ed845de1dfe070d43511fab1784e6c4118398/log4j-core-2.5.jar(org/apache/logging/log4j/core/config/plugins/PluginVisitorStrategy.class)]] [done in 5759 ms] 1 error :gatkTabComplete FAILED

FAILURE: Build failed with an exception.

BUILD FAILED

Total time: 7.431 secs

Sun-shan commented 7 years ago

@tomwhite Are there any suggestions?

tomwhite commented 7 years ago

It looks like you are using Spark 1.6 - you should be using Spark 2.0 or 2.2.

Sun-shan commented 7 years ago

I am using cloudera hadoop, with spark 2.2.0_cloudera1,does the gatk4 support the cloudera version spark? If so, how can I execute the command?

Sun-shan commented 7 years ago

In the GATK4 website, it says: "Running a Spark tool on a cluster requires Spark to have been installed from http://spark.apache.org/, since gatk-launch invokes the spark-submit tool behind-the-scenes."

when I use spark-submit ,it may invoke the spark version 1.6.0, so I usually use spark2-submit to invoke the spark 2.2.0, and with the GATK4 command, how can I invoke the spark2.2.0 ?

@tomwhite

tomwhite commented 7 years ago

You need to include the argument --sparkSubmitCommand spark2-submit in your gatk-launch command. E.g.

./gatk-launch PrintReadsSpark -I /gatk4/output.bam -O /gatk4/output_2.bam \
  -- \
  --sparkRunner SPARK --sparkMaster yarn-client --sparkSubmitCommand spark2-submit \
  --driver-memory 4G \
  --num-executors 1 \
  --executor-cores 1 \
  --executor-memory 4G \
  --conf spark.dynamicAllocation.enabled=false
Sun-shan commented 7 years ago

Yes , when I add the argument --sparkSubmitCommand spark2-submit, the former errors seem to disappear, but there seems to have a new error:

A USER ERROR has occurred: Couldn't write file /gatk4/output_3.bam because writing failed with exception /gatk4/output_3.bam.parts/_SUCCESS: Unable to find _SUCCESS file

-------------------------error

bash-4.2$ ./gatk-launch PrintReadsSpark -I /gatk4/output.bam -O /gatk4/output_3.bam -- --sparkRunner SPARK --sparkMaster yarn-client --sparkSubmitCommand spark2-submit Using GATK jar /opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar Running: spark2-submit --master yarn-client --conf spark.driver.userClassPathFirst=true --conf spark.io.compression.codec=lzf --conf spark.driver.maxResultSize=0 --conf spark.executor.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 --conf spark.driver.extraJavaOptions=-DGATK_STACKTRACE_ON_USER_EXCEPTION=true -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=false -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=1 --conf spark.kryoserializer.buffer.max=512m --conf spark.yarn.executor.memoryOverhead=600 /opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar PrintReadsSpark -I /gatk4/output.bam -O /gatk4/output_3.bam --sparkMaster yarn-client Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead. 18:11:33.604 WARN SparkContextFactory - Environment variables HELLBENDER_TEST_PROJECT and HELLBENDER_JSON_SERVICE_ACCOUNT_KEY must be set or the GCS hadoop connector will not be configured properly 18:11:33.737 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar!/com/intel/gkl/native/libgkl_compression.so [October 13, 2017 6:11:33 PM CST] PrintReadsSpark --output /gatk4/output_3.bam --input /gatk4/output.bam --sparkMaster yarn-client --readValidationStringency SILENT --interval_set_rule UNION --interval_padding 0 --interval_exclusion_padding 0 --interval_merging_rule ALL --bamPartitionSize 0 --disableSequenceDictionaryValidation false --shardedOutput false --numReducers 0 --help false --version false --showHidden false --verbosity INFO --QUIET false --use_jdk_deflater false --use_jdk_inflater false --gcs_max_retries 20 --disableToolDefaultReadFilters false [October 13, 2017 6:11:33 PM CST] Executing as hdfs@mg on Linux 3.10.0-514.el7.x86_64 amd64; Java HotSpot(TM) 64-Bit Server VM 1.8.0_91-b14; Version: 4.beta.5-70-gdc3237e-SNAPSHOT 18:11:33.870 INFO PrintReadsSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 1 18:11:33.871 INFO PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 18:11:33.871 INFO PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : false 18:11:33.871 INFO PrintReadsSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 18:11:33.871 INFO PrintReadsSpark - Deflater: IntelDeflater 18:11:33.871 INFO PrintReadsSpark - Inflater: IntelInflater 18:11:33.871 INFO PrintReadsSpark - GCS max retries/reopens: 20 18:11:33.871 INFO PrintReadsSpark - Using google-cloud-java patch c035098b5e62cb4fe9155eff07ce88449a361f5d from https://github.com/droazen/google-cloud-java/tree/dr_all_nio_fixes 18:11:33.871 INFO PrintReadsSpark - Initializing engine 18:11:33.871 INFO PrintReadsSpark - Done initializing engine 17/10/13 18:11:33 INFO spark.SparkContext: Running Spark version 2.2.0.cloudera1 17/10/13 18:11:34 WARN spark.SparkConf: spark.master yarn-client is deprecated in Spark 2.0+, please instead use "yarn" with specified deploy mode. 17/10/13 18:11:34 INFO spark.SparkContext: Submitted application: PrintReadsSpark 17/10/13 18:11:34 INFO spark.SecurityManager: Changing view acls to: hdfs 17/10/13 18:11:34 INFO spark.SecurityManager: Changing modify acls to: hdfs 17/10/13 18:11:34 INFO spark.SecurityManager: Changing view acls groups to: 17/10/13 18:11:34 INFO spark.SecurityManager: Changing modify acls groups to: 17/10/13 18:11:34 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); groups with view permissions: Set(); users with modify permissions: Set(hdfs); groups with modify permissions: Set() 17/10/13 18:11:34 INFO util.Utils: Successfully started service 'sparkDriver' on port 45754. 17/10/13 18:11:34 INFO spark.SparkEnv: Registering MapOutputTracker 17/10/13 18:11:34 INFO spark.SparkEnv: Registering BlockManagerMaster 17/10/13 18:11:34 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 17/10/13 18:11:34 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 17/10/13 18:11:34 INFO storage.DiskBlockManager: Created local directory at /tmp/hdfs/blockmgr-ea0e0669-2981-4277-80a0-a67eddf1001d 17/10/13 18:11:34 INFO memory.MemoryStore: MemoryStore started with capacity 366.3 MB 17/10/13 18:11:34 INFO spark.SparkEnv: Registering OutputCommitCoordinator 17/10/13 18:11:34 INFO util.log: Logging initialized @3816ms 17/10/13 18:11:34 INFO server.Server: jetty-9.3.z-SNAPSHOT 17/10/13 18:11:34 INFO server.Server: Started @3902ms 17/10/13 18:11:34 INFO server.AbstractConnector: Started ServerConnector@131ba51c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 17/10/13 18:11:34 INFO util.Utils: Successfully started service 'SparkUI' on port 4040. 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@710ae6a7{/jobs,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@7b211077{/jobs/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@62b0bf85{/jobs/job,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6f07d414{/jobs/job/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@40faff12{/stages,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@223967ea{/stages/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5d7f1e59{/stages/stage,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@68e47e7{/stages/stage/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@16ac4d3d{/stages/pool,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@719c1faf{/stages/pool/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@1f172892{/storage,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@45f9d394{/storage/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3a588b5f{/storage/rdd,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2bdb5e0f{/storage/rdd/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2262f0d8{/environment,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@59e082f8{/environment/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44d43cc9{/executors,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@656ec00d{/executors/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ed25612{/executors/threadDump,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@56e5c8fb{/executors/threadDump/json,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9e33a6a{/static,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@b3fc6d8{/,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5ed31735{/api,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@351e89fc{/jobs/job/kill,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@15586843{/stages/stage/kill,null,AVAILABLE,@Spark} 17/10/13 18:11:34 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://10.131.101.159:4040 17/10/13 18:11:34 INFO spark.SparkContext: Added JAR file:/opt/Software/gatk/build/libs/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar at spark://10.131.101.159:45754/jars/gatk-package-4.beta.5-70-gdc3237e-SNAPSHOT-spark.jar with timestamp 1507889494965 17/10/13 18:11:35 INFO util.Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 17/10/13 18:11:35 INFO gcs.GoogleHadoopFileSystemBase: GHFS version: 1.6.1-hadoop2 17/10/13 18:11:36 INFO client.RMProxy: Connecting to ResourceManager at mg/10.131.101.159:8032 17/10/13 18:11:36 INFO yarn.Client: Requesting a new application from cluster with 3 NodeManagers 17/10/13 18:11:36 INFO yarn.Client: Verifying our application has not requested more than the maximum memory capability of the cluster (164726 MB per container) 17/10/13 18:11:36 INFO yarn.Client: Will allocate AM container, with 896 MB memory including 384 MB overhead 17/10/13 18:11:36 INFO yarn.Client: Setting up container launch context for our AM 17/10/13 18:11:36 INFO yarn.Client: Setting up the launch environment for our AM container 17/10/13 18:11:36 INFO yarn.Client: Preparing resources for our AM container 17/10/13 18:11:37 INFO yarn.Client: Uploading resource file:/tmp/hdfs/spark-c7e5eece-205e-4bce-a69b-4168c9b79045/spark_conf2918234914787361986.zip -> hdfs://mg:8020/user/hdfs/.sparkStaging/application_1507856833944_0003/spark_conf.zip 17/10/13 18:11:37 INFO spark.SecurityManager: Changing view acls to: hdfs 17/10/13 18:11:37 INFO spark.SecurityManager: Changing modify acls to: hdfs 17/10/13 18:11:37 INFO spark.SecurityManager: Changing view acls groups to: 17/10/13 18:11:37 INFO spark.SecurityManager: Changing modify acls groups to: 17/10/13 18:11:37 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(hdfs); groups with view permissions: Set(); users with modify permissions: Set(hdfs); groups with modify permissions: Set() 17/10/13 18:11:37 INFO yarn.Client: Submitting application application_1507856833944_0003 to ResourceManager 17/10/13 18:11:37 INFO impl.YarnClientImpl: Submitted application application_1507856833944_0003 17/10/13 18:11:37 INFO cluster.SchedulerExtensionServices: Starting Yarn extension services with app application_1507856833944_0003 and attemptId None 17/10/13 18:11:38 INFO yarn.Client: Application report for application_1507856833944_0003 (state: ACCEPTED) 17/10/13 18:11:38 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: N/A ApplicationMaster RPC port: -1 queue: root.users.hdfs start time: 1507889497661 final status: UNDEFINED tracking URL: http://mg:8088/proxy/application_1507856833944_0003/ user: hdfs 17/10/13 18:11:39 INFO yarn.Client: Application report for application_1507856833944_0003 (state: ACCEPTED) 17/10/13 18:11:40 INFO yarn.Client: Application report for application_1507856833944_0003 (state: ACCEPTED) 17/10/13 18:11:41 INFO cluster.YarnSchedulerBackend$YarnSchedulerEndpoint: ApplicationMaster registered as NettyRpcEndpointRef(spark-client://YarnAM) 17/10/13 18:11:41 INFO cluster.YarnClientSchedulerBackend: Add WebUI Filter. org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter, Map(PROXY_HOSTS -> mg, PROXY_URI_BASES -> http://mg:8088/proxy/application_1507856833944_0003), /proxy/application_1507856833944_0003 17/10/13 18:11:41 INFO ui.JettyUtils: Adding filter: org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter 17/10/13 18:11:41 INFO yarn.Client: Application report for application_1507856833944_0003 (state: ACCEPTED) 17/10/13 18:11:42 INFO yarn.Client: Application report for application_1507856833944_0003 (state: RUNNING) 17/10/13 18:11:42 INFO yarn.Client: client token: N/A diagnostics: N/A ApplicationMaster host: 10.131.101.145 ApplicationMaster RPC port: 0 queue: root.users.hdfs start time: 1507889497661 final status: UNDEFINED tracking URL: http://mg:8088/proxy/application_1507856833944_0003/ user: hdfs 17/10/13 18:11:42 INFO cluster.YarnClientSchedulerBackend: Application application_1507856833944_0003 has started running. 17/10/13 18:11:42 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 44818. 17/10/13 18:11:42 INFO netty.NettyBlockTransferService: Server created on 10.131.101.159:44818 17/10/13 18:11:42 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 17/10/13 18:11:42 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, 10.131.101.159, 44818, None) 17/10/13 18:11:42 INFO storage.BlockManagerMasterEndpoint: Registering block manager 10.131.101.159:44818 with 366.3 MB RAM, BlockManagerId(driver, 10.131.101.159, 44818, None) 17/10/13 18:11:42 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, 10.131.101.159, 44818, None) 17/10/13 18:11:42 INFO storage.BlockManager: external shuffle service port = 7337 17/10/13 18:11:42 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, 10.131.101.159, 44818, None) 17/10/13 18:11:42 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@544300a6{/metrics/json,null,AVAILABLE,@Spark} 17/10/13 18:11:42 INFO scheduler.EventLoggingListener: Logging events to hdfs://mg:8020/user/spark/spark2ApplicationHistory/application_1507856833944_0003 17/10/13 18:11:42 INFO util.Utils: Using initial executors = 0, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 17/10/13 18:11:43 INFO cluster.YarnClientSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 17/10/13 18:11:43 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 286.0 KB, free 366.0 MB) 17/10/13 18:11:44 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 26.0 KB, free 366.0 MB) 17/10/13 18:11:44 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on 10.131.101.159:44818 (size: 26.0 KB, free: 366.3 MB) 17/10/13 18:11:44 INFO spark.SparkContext: Created broadcast 0 from newAPIHadoopFile at ReadsSparkSource.java:112 17/10/13 18:11:44 INFO memory.MemoryStore: Block broadcast_1 stored as values in memory (estimated size 14.5 KB, free 366.0 MB) 17/10/13 18:11:44 INFO memory.MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 2.1 KB, free 366.0 MB) 17/10/13 18:11:44 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on 10.131.101.159:44818 (size: 2.1 KB, free: 366.3 MB) 17/10/13 18:11:44 INFO spark.SparkContext: Created broadcast 1 from broadcast at ReadsSparkSink.java:195 17/10/13 18:11:44 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 1 17/10/13 18:11:44 INFO output.FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 17/10/13 18:11:44 INFO spark.SparkContext: Starting job: runJob at SparkHadoopMapReduceWriter.scala:88 17/10/13 18:11:44 INFO input.FileInputFormat: Total input paths to process : 1 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Registering RDD 5 (mapToPair at SparkUtils.java:157) 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Got job 0 (runJob at SparkHadoopMapReduceWriter.scala:88) with 1 output partitions 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Final stage: ResultStage 1 (runJob at SparkHadoopMapReduceWriter.scala:88) 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Parents of final stage: List(ShuffleMapStage 0) 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Missing parents: List(ShuffleMapStage 0) 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[5] at mapToPair at SparkUtils.java:157), which has no missing parents 17/10/13 18:11:44 INFO memory.MemoryStore: Block broadcast_2 stored as values in memory (estimated size 15.9 KB, free 366.0 MB) 17/10/13 18:11:44 INFO memory.MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 7.3 KB, free 366.0 MB) 17/10/13 18:11:44 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on 10.131.101.159:44818 (size: 7.3 KB, free: 366.3 MB) 17/10/13 18:11:44 INFO spark.SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1006 17/10/13 18:11:44 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[5] at mapToPair at SparkUtils.java:157) (first 15 tasks are for partitions Vector(0)) 17/10/13 18:11:44 INFO cluster.YarnScheduler: Adding task set 0.0 with 1 tasks 17/10/13 18:11:45 INFO spark.ExecutorAllocationManager: Requesting 1 new executor because tasks are backlogged (new desired total will be 1) 17/10/13 18:11:48 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.131.101.145:54024) with ID 1 17/10/13 18:11:48 INFO spark.ExecutorAllocationManager: New executor 1 has registered (new total is 1) 17/10/13 18:11:48 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, com2, executor 1, partition 0, NODE_LOCAL, 4877 bytes) 17/10/13 18:11:48 INFO storage.BlockManagerMasterEndpoint: Registering block manager com2:45501 with 366.3 MB RAM, BlockManagerId(1, com2, 45501, None) 17/10/13 18:11:50 INFO storage.BlockManagerInfo: Added broadcast_2_piece0 in memory on com2:45501 (size: 7.3 KB, free: 366.3 MB) 17/10/13 18:11:51 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on com2:45501 (size: 26.0 KB, free: 366.3 MB) 17/10/13 18:11:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 4638 ms on com2 (executor 1) (1/1) 17/10/13 18:11:53 INFO cluster.YarnScheduler: Removed TaskSet 0.0, whose tasks have all completed, from pool 17/10/13 18:11:53 INFO scheduler.DAGScheduler: ShuffleMapStage 0 (mapToPair at SparkUtils.java:157) finished in 8.668 s 17/10/13 18:11:53 INFO scheduler.DAGScheduler: looking for newly runnable stages 17/10/13 18:11:53 INFO scheduler.DAGScheduler: running: Set() 17/10/13 18:11:53 INFO scheduler.DAGScheduler: waiting: Set(ResultStage 1) 17/10/13 18:11:53 INFO scheduler.DAGScheduler: failed: Set() 17/10/13 18:11:53 INFO scheduler.DAGScheduler: Submitting ResultStage 1 (MapPartitionsRDD[9] at mapToPair at ReadsSparkSink.java:244), which has no missing parents 17/10/13 18:11:53 INFO memory.MemoryStore: Block broadcast_3 stored as values in memory (estimated size 86.8 KB, free 365.9 MB) 17/10/13 18:11:53 INFO memory.MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 32.6 KB, free 365.8 MB) 17/10/13 18:11:53 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on 10.131.101.159:44818 (size: 32.6 KB, free: 366.2 MB) 17/10/13 18:11:53 INFO spark.SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1006 17/10/13 18:11:53 INFO scheduler.DAGScheduler: Submitting 1 missing tasks from ResultStage 1 (MapPartitionsRDD[9] at mapToPair at ReadsSparkSink.java:244) (first 15 tasks are for partitions Vector(0)) 17/10/13 18:11:53 INFO cluster.YarnScheduler: Adding task set 1.0 with 1 tasks 17/10/13 18:11:53 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 1.0 (TID 1, com2, executor 1, partition 0, NODE_LOCAL, 4632 bytes) 17/10/13 18:11:53 INFO storage.BlockManagerInfo: Added broadcast_3_piece0 in memory on com2:45501 (size: 32.6 KB, free: 366.2 MB) 17/10/13 18:11:53 INFO spark.MapOutputTrackerMasterEndpoint: Asked to send map output locations for shuffle 0 to 10.131.101.145:54024 17/10/13 18:11:53 INFO spark.MapOutputTrackerMaster: Size of output statuses for shuffle 0 is 135 bytes 17/10/13 18:11:53 INFO storage.BlockManagerInfo: Added broadcast_1_piece0 in memory on com2:45501 (size: 2.1 KB, free: 366.2 MB) 17/10/13 18:11:53 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 1.0 (TID 1) in 565 ms on com2 (executor 1) (1/1) 17/10/13 18:11:53 INFO cluster.YarnScheduler: Removed TaskSet 1.0, whose tasks have all completed, from pool 17/10/13 18:11:53 INFO scheduler.DAGScheduler: ResultStage 1 (runJob at SparkHadoopMapReduceWriter.scala:88) finished in 0.566 s 17/10/13 18:11:53 INFO scheduler.DAGScheduler: Job 0 finished: runJob at SparkHadoopMapReduceWriter.scala:88, took 9.524571 s 17/10/13 18:11:53 INFO io.SparkHadoopMapReduceWriter: Job job_20171013181144_0009 committed. 17/10/13 18:11:53 INFO server.AbstractConnector: Stopped Spark@131ba51c{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 17/10/13 18:11:53 INFO ui.SparkUI: Stopped Spark web UI at http://10.131.101.159:4040 17/10/13 18:11:54 INFO cluster.YarnClientSchedulerBackend: Interrupting monitor thread 17/10/13 18:11:54 INFO cluster.YarnClientSchedulerBackend: Shutting down all executors 17/10/13 18:11:54 INFO cluster.YarnSchedulerBackend$YarnDriverEndpoint: Asking each executor to shut down 17/10/13 18:11:54 INFO cluster.SchedulerExtensionServices: Stopping SchedulerExtensionServices (serviceOption=None, services=List(), started=false) 17/10/13 18:11:54 INFO cluster.YarnClientSchedulerBackend: Stopped 17/10/13 18:11:54 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 17/10/13 18:11:54 INFO memory.MemoryStore: MemoryStore cleared 17/10/13 18:11:54 INFO storage.BlockManager: BlockManager stopped 17/10/13 18:11:54 INFO storage.BlockManagerMaster: BlockManagerMaster stopped 17/10/13 18:11:54 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/10/13 18:11:54 INFO spark.SparkContext: Successfully stopped SparkContext 18:11:54.552 INFO PrintReadsSpark - Shutting down engine [October 13, 2017 6:11:54 PM CST] org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark done. Elapsed time: 0.35 minutes. Runtime.totalMemory()=806354944


A USER ERROR has occurred: Couldn't write file /gatk4/output_3.bam because writing failed with exception /gatk4/output_3.bam.parts/_SUCCESS: Unable to find _SUCCESS file


org.broadinstitute.hellbender.exceptions.UserException$CouldNotCreateOutputFile: Couldn't write file /gatk4/output_3.bam because writing failed with exception /gatk4/output_3.bam.parts/_SUCCESS: Unable to find _SUCCESS file at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:264) at org.broadinstitute.hellbender.tools.spark.pipelines.PrintReadsSpark.runTool(PrintReadsSpark.java:39) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:362) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:38) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:119) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:176) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:195) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:137) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:158) at org.broadinstitute.hellbender.Main.main(Main.java:239) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:755) at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.nio.file.NoSuchFileException: /gatk4/output_3.bam.parts/_SUCCESS: Unable to find _SUCCESS file at org.seqdoop.hadoop_bam.util.SAMFileMerger.mergeParts(SAMFileMerger.java:53) at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReadsSingle(ReadsSparkSink.java:231) at org.broadinstitute.hellbender.engine.spark.datasources.ReadsSparkSink.writeReads(ReadsSparkSink.java:153) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.writeReads(GATKSparkTool.java:259) ... 18 more 17/10/13 18:11:54 INFO util.ShutdownHookManager: Shutdown hook called 17/10/13 18:11:54 INFO util.ShutdownHookManager: Deleting directory /tmp/hdfs/spark-c7e5eece-205e-4bce-a69b-4168c9b79045

tomwhite commented 7 years ago

Does the _SUCCESS file exist?

You may have to specify the full output path as hdfs:///gatk4/output_3.bam, or hdfs://<namenode>:8020/gatk4/output_3.bam.

Sun-shan commented 7 years ago

Thank you , when I change the file path, it seems to work !

Sun-shan commented 7 years ago

Hi, @tomwhite when I executed this command, I encountered some error: /gatk-launch BQSRPipelineSpark --knownSites /data/NfsDir/PublicDir/1000g/1000G_phase1.indels.hg19.vcf --knownSites /data/NfsDir/PublicDir/1000g/Mills_and_1000G_gold_standard.indels.hg19.sites.vcf --knownSites /data/NfsDir/PublicDir/dbsnp/dbsnp_138.hg19.vcf -I 1983.align.reorder.sorted.makrdup.bam -O 1983.align.reorder.sorted.makrdup.bqsr.bam -R ~/Tools/hg19.2bit


17/10/18 17:35:58 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-0,5,main] java.lang.OutOfMemoryError: Java heap space at htsjdk.samtools.SAMUtils.compressedBasesToBytes(SAMUtils.java:146) at htsjdk.samtools.BAMRecord.decodeReadBases(BAMRecord.java:346) at htsjdk.samtools.BAMRecord.getReadBases(BAMRecord.java:275) at org.broadinstitute.hellbender.utils.read.SAMRecordToGATKReadAdapter.getLength(SAMRecordToGATKReadAdapter.java:222) at org.broadinstitute.hellbender.engine.filters.ReadFilterLibrary$MatchingBasesAndQualsReadFilter.test(ReadFilterLibrary.java:64) at org.broadinstitute.hellbender.engine.filters.ReadFilter$ReadFilterAnd.test(ReadFilter.java:70) at org.broadinstitute.hellbender.engine.filters.ReadFilter$ReadFilterAnd.test(ReadFilter.java:70) at org.broadinstitute.hellbender.engine.filters.ReadFilter$ReadFilterAnd.test(ReadFilter.java:70) at org.broadinstitute.hellbender.engine.filters.ReadFilter$ReadFilterAnd.test(ReadFilter.java:70) at org.broadinstitute.hellbender.engine.filters.WellformedReadFilter.test(WellformedReadFilter.java:77) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.lambda$getReads$e4b35a40$1(GATKSparkTool.java:213) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool$$Lambda$93/2063469002.call(Unknown Source) at org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.apply(JavaRDD.scala:76) at org.apache.spark.api.java.JavaRDD$$anonfun$filter$1.apply(JavaRDD.scala:76) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:463) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.tools.spark.transforms.ApplyBQSRSparkFn.lambda$apply$5412c5cb$1(ApplyBQSRSparkFn.java:22) at org.broadinstitute.hellbender.tools.spark.transforms.ApplyBQSRSparkFn$$Lambda$214/1243271334.call(Unknown Source) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 17/10/18 17:35:58 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, 10.131.101.159, 35676),broadcast_4_piece167,StorageLevel(1 replicas),0,0)) 17/10/18 17:35:58 ERROR LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerBlockUpdated(BlockUpdatedInfo(BlockManagerId(driver, 10.131.101.159, 35676),broadcast_4_piece173,StorageLevel(1 replicas),0,0)) 17/10/18 17:35:58 WARN Executor: Issue communicating with driver in heartbeater java.lang.NullPointerException at org.apache.spark.storage.BlockManagerMaster.updateBlockInfo(BlockManagerMaster.scala:67) at org.apache.spark.storage.BlockManager.org$apache$spark$storage$BlockManager$$tryToReportBlockStatus(BlockManager.scala:363) at org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:219) at org.apache.spark.storage.BlockManager$$anonfun$reportAllBlocks$3.apply(BlockManager.scala:217) at scala.collection.Iterator$class.foreach(Iterator.scala:893) at scala.collection.AbstractIterator.foreach(Iterator.scala:1336) at org.apache.spark.storage.BlockManager.reportAllBlocks(BlockManager.scala:217) at org.apache.spark.storage.BlockManager.reregister(BlockManager.scala:236) at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$reportHeartBeat(Executor.scala:522) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply$mcV$sp(Executor.scala:547) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) at org.apache.spark.executor.Executor$$anon$1$$anonfun$run$1.apply(Executor.scala:547) at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1953) at org.apache.spark.executor.Executor$$anon$1.run(Executor.scala:547) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) at java.lang.Thread.run(Thread.java:745) 17/10/18 17:35:58 INFO BlockManagerMaster: BlockManagerMaster stopped 17/10/18 17:35:58 ERROR SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[Executor task launch worker-1,5,main] java.lang.OutOfMemoryError: Java heap space at htsjdk.samtools.BAMRecordCodec.decode(BAMRecordCodec.java:208) at htsjdk.samtools.BAMFileReader$BAMFileIterator.getNextRecord(BAMFileReader.java:829) at htsjdk.samtools.BAMFileReader$BAMFileIndexIterator.getNextRecord(BAMFileReader.java:981) at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:803) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:797) at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:765) at org.seqdoop.hadoop_bam.BAMRecordReader.nextKeyValue(BAMRecordReader.java:225) at org.apache.spark.rdd.NewHadoopRDD$$anon$1.hasNext(NewHadoopRDD.scala:182) at org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:461) at scala.collection.convert.Wrappers$IteratorWrapper.hasNext(Wrappers.scala:30) at java.util.Iterator.forEachRemaining(Iterator.java:115) at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801) at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481) at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471) at java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708) at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234) at java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499) at org.broadinstitute.hellbender.tools.spark.transforms.ApplyBQSRSparkFn.lambda$apply$5412c5cb$1(ApplyBQSRSparkFn.java:22) at org.broadinstitute.hellbender.tools.spark.transforms.ApplyBQSRSparkFn$$Lambda$214/1243271334.call(Unknown Source) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152) at org.apache.spark.api.java.JavaRDDLike$$anonfun$fn$4$1.apply(JavaRDDLike.scala:152) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) at org.apache.spark.rdd.RDD$$anonfun$mapPartitions$1$$anonfun$apply$23.apply(RDD.scala:785) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:319) at org.apache.spark.rdd.RDD.iterator(RDD.scala:283) 17/10/18 17:35:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 17/10/18 17:35:58 INFO ShutdownHookManager: Shutdown hook called 17/10/18 17:35:58 INFO SparkContext: Successfully stopped SparkContext 17/10/18 17:35:58 INFO ShutdownHookManager: Deleting directory /tmp/sunfl/spark-475438b6-0b2e-46eb-924b-8bd2e00614a1/userFiles-eb98192b-71ff-4f2d-aa54-be4d5a9e2e94 17/10/18 17:35:58 INFO ShutdownHookManager: Deleting directory /tmp/sunfl/spark-475438b6-0b2e-46eb-924b-8bd2e00614a1

The whole error file is in the attachment 1983_info.txt

How can I fix it ?

Sun-shan commented 7 years ago

Is there anything I can do to find what'wrong? @tomwhite

lbergelson commented 7 years ago

@Sun-shan It sounds like you're running out of heap space on the driver. What are you setting --driver-memory as?

tomwhite commented 7 years ago

I wrote up some suggestions for memory settings here: https://github.com/broadinstitute/gatk/wiki/Troubleshooting-Spark

Sun-shan commented 7 years ago

thank you very much! @tomwhite @lbergelson

lbergelson commented 6 years ago

I'm closing this because I think it's resolved. @Sun-shan Feel free to reopen if it's not.