FredHutch / Galeano-Nino-Bullman-Intratumoral-Microbiota_2022

Analysis code used in Galeano Nino et al., Impact of Intratumoral Microbiota on Spatial and Cellular Heterogeneity in human cancer. 2022
MIT License
33 stars 10 forks source link

Running for Part1.10x Visium spatial transcriptomic data #13

Closed februaryfang closed 1 year ago

februaryfang commented 1 year ago

hello, I followed the Visium_pipeline.sh in Part 1 to analyze CRC_16 10x Visium spatial transcriptomic data. I did not download the data from the database on the GATK official website. But I prepared the database according to the tutorial [https://gatk.broadinstitute.org/hc/en-us/articles/360035889911--How-to-Run-the-Pathseq-pipeline] by myself. The analysis has no results, and I don't know the reason for the lack of results.

Using GATK jar /mnt/icfs/work/singlecelldevelopment/software/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar Running: java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx750g -jar /mnt/icfs/work/singlecelldevelopment/software/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar PathSeqPipelineSpark --input CRC_16/outs/possorted_genome_bam.bam --filter-bwa-image hsa_GRCh38/genome.fa.img --kmer-file hsa_GRCh38/genome.hss --min-clipped-read-length 60 --microbe-dict 16SrRNA/bacteria.16SrRNA.dict --microbe-bwa-image 16SrRNA/bacteria.16SrRNA.fa.img --taxonomy-file 16SrRNA/16SrRNA.db --output pathseq/CRC_16.pathseq.complete.bam --scores-output pathseq/CRC_16.pathseq.complete.csv --is-host-aligned false --filter-duplicates false --min-score-identity .7 --tmp-dir pathseq/tmp 13:19:23.776 INFO NativeLibraryLoader - Loading libgkl_compression.so from jar:file:/mnt/icfs/work/singlecelldevelopment/software/gatk-4.3.0.0/gatk-package-4.3.0.0-local.jar!/com/intel/gkl/native/libgkl_compression.so 13:19:28.982 INFO PathSeqPipelineSpark - ------------------------------------------------------------ 13:19:28.982 INFO PathSeqPipelineSpark - The Genome Analysis Toolkit (GATK) v4.3.0.0 13:19:28.982 INFO PathSeqPipelineSpark - For support and documentation go to https://software.broadinstitute.org/gatk/ 13:19:28.983 INFO PathSeqPipelineSpark - Executing as singlecellproject@d01.capitalbiotech.local on Linux v3.10.0-514.16.1.el7.x86_64 amd64 13:19:28.983 INFO PathSeqPipelineSpark - Java runtime: OpenJDK 64-Bit Server VM v1.8.0_151-b12 13:19:28.983 INFO PathSeqPipelineSpark - Start Date/Time: May 23, 2023 1:19:23 PM CST 13:19:28.983 INFO PathSeqPipelineSpark - ------------------------------------------------------------ 13:19:28.983 INFO PathSeqPipelineSpark - ------------------------------------------------------------ 13:19:28.984 INFO PathSeqPipelineSpark - HTSJDK Version: 3.0.1 13:19:28.984 INFO PathSeqPipelineSpark - Picard Version: 2.27.5 13:19:28.984 INFO PathSeqPipelineSpark - Built for Spark Version: 2.4.5 13:19:28.984 INFO PathSeqPipelineSpark - HTSJDK Defaults.COMPRESSION_LEVEL : 2 13:19:28.984 INFO PathSeqPipelineSpark - HTSJDK Defaults.USE_ASYNC_IO_READ_FOR_SAMTOOLS : false 13:19:28.984 INFO PathSeqPipelineSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_SAMTOOLS : true 13:19:28.984 INFO PathSeqPipelineSpark - HTSJDK Defaults.USE_ASYNC_IO_WRITE_FOR_TRIBBLE : false 13:19:28.985 INFO PathSeqPipelineSpark - Deflater: IntelDeflater 13:19:28.985 INFO PathSeqPipelineSpark - Inflater: IntelInflater 13:19:28.985 INFO PathSeqPipelineSpark - GCS max retries/reopens: 20 13:19:28.985 INFO PathSeqPipelineSpark - Requester pays: disabled 13:19:28.985 INFO PathSeqPipelineSpark - Initializing engine 13:19:28.985 INFO PathSeqPipelineSpark - Done initializing engine Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties 23/05/23 13:19:29 INFO SparkContext: Running Spark version 2.4.5 23/05/23 13:19:29 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 23/05/23 13:19:29 INFO SparkContext: Submitted application: PathSeqPipelineSpark 23/05/23 13:19:29 INFO SecurityManager: Changing view acls to: singlecellproject 23/05/23 13:19:29 INFO SecurityManager: Changing modify acls to: singlecellproject 23/05/23 13:19:29 INFO SecurityManager: Changing view acls groups to: 23/05/23 13:19:29 INFO SecurityManager: Changing modify acls groups to: 23/05/23 13:19:29 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(singlecellproject); groups with view permissions: Set(); users with modify permissions: Set(singlecellproject); groups with modify permissions: Set() 23/05/23 13:19:29 INFO Utils: Successfully started service 'sparkDriver' on port 40471. 23/05/23 13:19:29 INFO SparkEnv: Registering MapOutputTracker 23/05/23 13:19:29 INFO SparkEnv: Registering BlockManagerMaster 23/05/23 13:19:29 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/05/23 13:19:29 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/05/23 13:19:29 INFO DiskBlockManager: Created local directory at pathseq/tmp/blockmgr-11fec4b1-0808-4f7e-9ab9-a87799853aee 23/05/23 13:19:29 INFO MemoryStore: MemoryStore started with capacity 399.8 GB 23/05/23 13:19:29 INFO SparkEnv: Registering OutputCommitCoordinator 23/05/23 13:19:30 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/05/23 13:19:30 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://d01.capitalbiotech.local:4040 23/05/23 13:19:30 INFO Executor: Starting executor ID driver on host localhost 23/05/23 13:19:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 41352. 23/05/23 13:19:30 INFO NettyBlockTransferService: Server created on d01.capitalbiotech.local:41352 23/05/23 13:19:30 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 23/05/23 13:19:30 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, d01.capitalbiotech.local, 41352, None) 23/05/23 13:19:30 INFO BlockManagerMasterEndpoint: Registering block manager d01.capitalbiotech.local:41352 with 399.8 GB RAM, BlockManagerId(driver, d01.capitalbiotech.local, 41352, None) 23/05/23 13:19:30 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, d01.capitalbiotech.local, 41352, None) 23/05/23 13:19:30 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, d01.capitalbiotech.local, 41352, None) 13:19:30.590 INFO PathSeqPipelineSpark - Spark verbosity set to INFO (see --spark-verbosity argument) 23/05/23 13:19:30 INFO GoogleHadoopFileSystemBase: GHFS version: 1.9.4-hadoop3 23/05/23 13:19:31 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 392.4 KB, free 399.8 GB) 23/05/23 13:19:31 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 35.5 KB, free 399.8 GB) 23/05/23 13:19:31 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on d01.capitalbiotech.local:41352 (size: 35.5 KB, free: 399.8 GB) 23/05/23 13:19:31 INFO SparkContext: Created broadcast 0 from newAPIHadoopFile at PathSplitSource.java:96 13:19:32.136 WARN PathSeqPipelineSpark - --is-host-aligned is false but there are one or more sequences in the BAM header 23/05/23 13:19:32 INFO MemoryStore: Block broadcast_1 stored as values in memory (estimated size 392.4 KB, free 399.8 GB) 23/05/23 13:19:32 INFO MemoryStore: Block broadcast_1_piece0 stored as bytes in memory (estimated size 35.5 KB, free 399.8 GB) 23/05/23 13:19:32 INFO BlockManagerInfo: Added broadcast_1_piece0 in memory on d01.capitalbiotech.local:41352 (size: 35.5 KB, free: 399.8 GB) 23/05/23 13:19:32 INFO SparkContext: Created broadcast 1 from newAPIHadoopFile at PathSplitSource.java:96 23/05/23 13:19:32 INFO FileInputFormat: Total input files to process : 1 23/05/23 13:19:32 INFO SparkContext: Starting job: count at PathSeqPipelineSpark.java:244 23/05/23 13:19:32 INFO DAGScheduler: Registering RDD 25 (mapToPair at PSFilter.java:128) as input to shuffle 2 23/05/23 13:19:32 INFO DAGScheduler: Registering RDD 29 (mapToPair at PSFilter.java:128) as input to shuffle 1 23/05/23 13:19:32 INFO DAGScheduler: Registering RDD 34 (mapToPair at PSFilter.java:128) as input to shuffle 0 23/05/23 13:19:32 INFO DAGScheduler: Got job 0 (count at PathSeqPipelineSpark.java:244) with 244 output partitions 23/05/23 13:19:32 INFO DAGScheduler: Final stage: ResultStage 3 (count at PathSeqPipelineSpark.java:244) 23/05/23 13:19:32 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 2) 23/05/23 13:19:32 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 2) 23/05/23 13:19:32 INFO DAGScheduler: Submitting ShuffleMapStage 0 (MapPartitionsRDD[25] at mapToPair at PSFilter.java:128), which has no missing parents 23/05/23 13:19:32 INFO MemoryStore: Block broadcast_2 stored as values in memory (estimated size 276.6 KB, free 399.8 GB) 23/05/23 13:19:32 INFO MemoryStore: Block broadcast_2_piece0 stored as bytes in memory (estimated size 120.5 KB, free 399.8 GB) 23/05/23 13:19:32 INFO BlockManagerInfo: Added broadcast_2_piece0 in memory on d01.capitalbiotech.local:41352 (size: 120.5 KB, free: 399.8 GB) 23/05/23 13:19:32 INFO SparkContext: Created broadcast 2 from broadcast at DAGScheduler.scala:1163 23/05/23 13:19:32 INFO DAGScheduler: Submitting 244 missing tasks from ShuffleMapStage 0 (MapPartitionsRDD[25] at mapToPair at PSFilter.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:19:32 INFO TaskSchedulerImpl: Adding task set 0.0 with 244 tasks 23/05/23 13:19:32 INFO TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, localhost, executor driver, partition 0, PROCESS_LOCAL, 8030 bytes) 23/05/23 13:19:32 INFO TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, localhost, executor driver, partition 1, PROCESS_LOCAL, 8030 bytes) ... 23/05/23 13:19:32 INFO TaskSetManager: Starting task 126.0 in stage 0.0 (TID 126, localhost, executor driver, partition 126, PROCESS_LOCAL, 8030 bytes) 23/05/23 13:19:32 INFO TaskSetManager: Starting task 127.0 in stage 0.0 (TID 127, localhost, executor driver, partition 127, PROCESS_LOCAL, 8030 bytes) 23/05/23 13:19:32 INFO Executor: Running task 5.0 in stage 0.0 (TID 5) ... 23/05/23 13:19:33 INFO Executor: Running task 127.0 in stage 0.0 (TID 127) 23/05/23 13:19:33 INFO BlockManagerInfo: Removed broadcast_0_piece0 on d01.capitalbiotech.local:41352 in memory (size: 35.5 KB, free: 399.8 GB) 23/05/23 13:19:34 INFO NewHadoopRDD: Input split: file:spaceranger_count/CRC_16/outs/possorted_genome_bam.bam:2080374784+33554432 ... 23/05/23 13:19:51 INFO Executor: Finished task 112.0 in stage 0.0 (TID 112). 1128 bytes result sent to driver 23/05/23 13:19:51 INFO TaskSetManager: Starting task 132.0 in stage 0.0 (TID 132, localhost, executor driver, partition 132, PROCESS_LOCAL, 8030 bytes) 23/05/23 13:19:51 INFO Executor: Running task 132.0 in stage 0.0 (TID 132) 23/05/23 13:19:51 INFO TaskSetManager: Finished task 123.0 in stage 0.0 (TID 123) in 18852 ms on localhost (executor driver) (1/244) ... 23/05/23 13:20:06 INFO Executor: Finished task 239.0 in stage 0.0 (TID 239). 1128 bytes result sent to driver 23/05/23 13:20:06 INFO TaskSetManager: Finished task 239.0 in stage 0.0 (TID 239) in 8347 ms on localhost (executor driver) (244/244) 23/05/23 13:20:06 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 23/05/23 13:20:06 INFO DAGScheduler: ShuffleMapStage 0 (mapToPair at PSFilter.java:128) finished in 33.663 s 23/05/23 13:20:06 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:06 INFO DAGScheduler: running: Set() 23/05/23 13:20:06 INFO DAGScheduler: waiting: Set(ShuffleMapStage 1, ShuffleMapStage 2, ResultStage 3) 23/05/23 13:20:06 INFO DAGScheduler: failed: Set() 23/05/23 13:20:06 INFO DAGScheduler: Submitting ShuffleMapStage 1 (MapPartitionsRDD[29] at mapToPair at PSFilter.java:128), which has no missing parents 23/05/23 13:20:06 INFO MemoryStore: Block broadcast_3 stored as values in memory (estimated size 5.6 KB, free 399.8 GB) 23/05/23 13:20:06 INFO MemoryStore: Block broadcast_3_piece0 stored as bytes in memory (estimated size 3.3 KB, free 399.8 GB) 23/05/23 13:20:06 INFO BlockManagerInfo: Added broadcast_3_piece0 in memory on d01.capitalbiotech.local:41352 (size: 3.3 KB, free: 399.8 GB) 23/05/23 13:20:06 INFO SparkContext: Created broadcast 3 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:06 INFO DAGScheduler: Submitting 244 missing tasks from ShuffleMapStage 1 (MapPartitionsRDD[29] at mapToPair at PSFilter.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:06 INFO TaskSchedulerImpl: Adding task set 1.0 with 244 tasks 23/05/23 13:20:06 INFO TaskSetManager: Starting task 0.0 in stage 1.0 (TID 244, localhost, executor driver, partition 0, PROCESS_LOCAL, 7651 bytes) ... 23/05/23 13:20:06 INFO TaskSetManager: Starting task 127.0 in stage 1.0 (TID 371, localhost, executor driver, partition 127, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:06 INFO Executor: Running task 0.0 in stage 1.0 (TID 244) ... 23/05/23 13:20:06 INFO Executor: Running task 108.0 in stage 1.0 (TID 352) 23/05/23 13:20:06 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks ... 23/05/23 13:20:06 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:06 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 23/05/23 13:20:06 INFO Executor: Finished task 111.0 in stage 1.0 (TID 355). 1300 bytes result sent to driver 23/05/23 13:20:06 INFO TaskSetManager: Starting task 128.0 in stage 1.0 (TID 372, localhost, executor driver, partition 128, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:06 INFO TaskSetManager: Finished task 111.0 in stage 1.0 (TID 355) in 315 ms on localhost (executor driver) (1/244) 23/05/23 13:20:06 INFO Executor: Running task 128.0 in stage 1.0 (TID 372) ... 23/05/23 13:20:09 INFO TaskSetManager: Finished task 128.0 in stage 1.0 (TID 372) in 2495 ms on localhost (executor driver) (244/244) 23/05/23 13:20:09 INFO TaskSchedulerImpl: Removed TaskSet 1.0, whose tasks have all completed, from pool 23/05/23 13:20:09 INFO DAGScheduler: ShuffleMapStage 1 (mapToPair at PSFilter.java:128) finished in 2.853 s 23/05/23 13:20:09 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:09 INFO DAGScheduler: running: Set() 23/05/23 13:20:09 INFO DAGScheduler: waiting: Set(ShuffleMapStage 2, ResultStage 3) 23/05/23 13:20:09 INFO DAGScheduler: failed: Set() 23/05/23 13:20:09 INFO DAGScheduler: Submitting ShuffleMapStage 2 (MapPartitionsRDD[34] at mapToPair at PSFilter.java:128), which has no missing parents 23/05/23 13:20:09 INFO MemoryStore: Block broadcast_4 stored as values in memory (estimated size 8.1 KB, free 399.8 GB) 23/05/23 13:20:09 INFO MemoryStore: Block broadcast_4_piece0 stored as bytes in memory (estimated size 4.5 KB, free 399.8 GB) 23/05/23 13:20:09 INFO BlockManagerInfo: Added broadcast_4_piece0 in memory on d01.capitalbiotech.local:41352 (size: 4.5 KB, free: 399.8 GB) 23/05/23 13:20:09 INFO SparkContext: Created broadcast 4 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:09 INFO DAGScheduler: Submitting 244 missing tasks from ShuffleMapStage 2 (MapPartitionsRDD[34] at mapToPair at PSFilter.java:128) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:09 INFO TaskSchedulerImpl: Adding task set 2.0 with 244 tasks 23/05/23 13:20:09 INFO TaskSetManager: Starting task 0.0 in stage 2.0 (TID 488, localhost, executor driver, partition 0, PROCESS_LOCAL, 7651 bytes) ... 23/05/23 13:20:09 INFO TaskSetManager: Starting task 127.0 in stage 2.0 (TID 615, localhost, executor driver, partition 127, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:09 INFO Executor: Running task 0.0 in stage 2.0 (TID 488) ... 23/05/23 13:20:09 INFO Executor: Running task 21.0 in stage 2.0 (TID 509) 23/05/23 13:20:09 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:09 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms ... 23/05/23 13:20:09 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:09 INFO Executor: Finished task 100.0 in stage 2.0 (TID 588). 1343 bytes result sent to driver 23/05/23 13:20:09 INFO TaskSetManager: Starting task 128.0 in stage 2.0 (TID 616, localhost, executor driver, partition 128, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:09 INFO Executor: Running task 128.0 in stage 2.0 (TID 616) 23/05/23 13:20:09 INFO TaskSetManager: Finished task 100.0 in stage 2.0 (TID 588) in 84 ms on localhost (executor driver) (1/244) 23/05/23 13:20:09 INFO Executor: Finished task 105.0 in stage 2.0 (TID 593). 1300 bytes result sent to driver ... 23/05/23 13:20:11 INFO TaskSetManager: Finished task 187.0 in stage 2.0 (TID 675) in 2115 ms on localhost (executor driver) (244/244) 23/05/23 13:20:11 INFO TaskSchedulerImpl: Removed TaskSet 2.0, whose tasks have all completed, from pool 23/05/23 13:20:11 INFO DAGScheduler: ShuffleMapStage 2 (mapToPair at PSFilter.java:128) finished in 2.752 s 23/05/23 13:20:11 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:11 INFO DAGScheduler: running: Set() 23/05/23 13:20:11 INFO DAGScheduler: waiting: Set(ResultStage 3) 23/05/23 13:20:11 INFO DAGScheduler: failed: Set() 23/05/23 13:20:11 INFO DAGScheduler: Submitting ResultStage 3 (MapPartitionsRDD[38] at flatMap at PSPairedUnpairedSplitterSpark.java:50), which has no missing parents 23/05/23 13:20:11 INFO MemoryStore: Block broadcast_5 stored as values in memory (estimated size 5.6 KB, free 399.8 GB) 23/05/23 13:20:11 INFO MemoryStore: Block broadcast_5_piece0 stored as bytes in memory (estimated size 3.2 KB, free 399.8 GB) 23/05/23 13:20:11 INFO BlockManagerInfo: Added broadcast_5_piece0 in memory on d01.capitalbiotech.local:41352 (size: 3.2 KB, free: 399.8 GB) 23/05/23 13:20:11 INFO SparkContext: Created broadcast 5 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:11 INFO DAGScheduler: Submitting 244 missing tasks from ResultStage 3 (MapPartitionsRDD[38] at flatMap at PSPairedUnpairedSplitterSpark.java:50) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:11 INFO TaskSchedulerImpl: Adding task set 3.0 with 244 tasks 23/05/23 13:20:11 INFO TaskSetManager: Starting task 0.0 in stage 3.0 (TID 732, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes) ... 23/05/23 13:20:11 INFO TaskSetManager: Starting task 127.0 in stage 3.0 (TID 859, localhost, executor driver, partition 127, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:11 INFO Executor: Running task 0.0 in stage 3.0 (TID 732) ... 23/05/23 13:20:11 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:11 INFO Executor: Finished task 37.0 in stage 3.0 (TID 769). 967 bytes result sent to driver 23/05/23 13:20:11 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 12 ms 23/05/23 13:20:11 INFO Executor: Finished task 48.0 in stage 3.0 (TID 780). 1010 bytes result sent to driver 23/05/23 13:20:11 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 23/05/23 13:20:11 INFO TaskSetManager: Finished task 60.0 in stage 3.0 (TID 792) in 42 ms on localhost (executor driver) (2/244) 23/05/23 13:20:11 INFO Executor: Finished task 45.0 in stage 3.0 (TID 777). 967 bytes result sent to driver 23/05/23 13:20:11 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 12 ms ... 23/05/23 13:20:12 INFO TaskSetManager: Finished task 235.0 in stage 3.0 (TID 967) in 22 ms on localhost (executor driver) (244/244) 23/05/23 13:20:12 INFO TaskSchedulerImpl: Removed TaskSet 3.0, whose tasks have all completed, from pool 23/05/23 13:20:12 INFO DAGScheduler: ResultStage 3 (count at PathSeqPipelineSpark.java:244) finished in 0.156 s 23/05/23 13:20:12 INFO DAGScheduler: Job 0 finished: count at PathSeqPipelineSpark.java:244, took 39.619893 s 23/05/23 13:20:12 INFO SparkContext: Starting job: count at PathSeqPipelineSpark.java:245 23/05/23 13:20:12 INFO DAGScheduler: Got job 1 (count at PathSeqPipelineSpark.java:245) with 244 output partitions 23/05/23 13:20:12 INFO DAGScheduler: Final stage: ResultStage 7 (count at PathSeqPipelineSpark.java:245) 23/05/23 13:20:12 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 6) 23/05/23 13:20:12 INFO DAGScheduler: Missing parents: List() 23/05/23 13:20:12 INFO DAGScheduler: Submitting ResultStage 7 (MapPartitionsRDD[39] at flatMap at PSPairedUnpairedSplitterSpark.java:57), which has no missing parents 23/05/23 13:20:12 INFO MemoryStore: Block broadcast_6 stored as values in memory (estimated size 5.6 KB, free 399.8 GB) 23/05/23 13:20:12 INFO MemoryStore: Block broadcast_6_piece0 stored as bytes in memory (estimated size 3.2 KB, free 399.8 GB) 23/05/23 13:20:12 INFO BlockManagerInfo: Added broadcast_6_piece0 in memory on d01.capitalbiotech.local:41352 (size: 3.2 KB, free: 399.8 GB) 23/05/23 13:20:12 INFO SparkContext: Created broadcast 6 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:12 INFO DAGScheduler: Submitting 244 missing tasks from ResultStage 7 (MapPartitionsRDD[39] at flatMap at PSPairedUnpairedSplitterSpark.java:57) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:12 INFO TaskSchedulerImpl: Adding task set 7.0 with 244 tasks 23/05/23 13:20:12 INFO TaskSetManager: Starting task 0.0 in stage 7.0 (TID 976, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes) ... 23/05/23 13:20:12 INFO TaskSetManager: Starting task 127.0 in stage 7.0 (TID 1103, localhost, executor driver, partition 127, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:12 INFO Executor: Running task 5.0 in stage 7.0 (TID 981) ... 23/05/23 13:20:12 INFO Executor: Running task 11.0 in stage 7.0 (TID 987) 23/05/23 13:20:12 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks ... 23/05/23 13:20:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms 23/05/23 13:20:12 INFO Executor: Running task 129.0 in stage 7.0 (TID 1105) 23/05/23 13:20:12 INFO TaskSetManager: Finished task 126.0 in stage 7.0 (TID 1102) in 72 ms on localhost (executor driver) (2/244) 23/05/23 13:20:12 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms ... 23/05/23 13:20:12 INFO TaskSetManager: Finished task 243.0 in stage 7.0 (TID 1219) in 14 ms on localhost (executor driver) (244/244) 23/05/23 13:20:12 INFO TaskSchedulerImpl: Removed TaskSet 7.0, whose tasks have all completed, from pool 23/05/23 13:20:12 INFO DAGScheduler: ResultStage 7 (count at PathSeqPipelineSpark.java:245) finished in 0.175 s 23/05/23 13:20:12 INFO DAGScheduler: Job 1 finished: count at PathSeqPipelineSpark.java:245, took 0.184459 s 23/05/23 13:20:12 INFO SparkContext: Starting job: foreach at BwaMemIndexCache.java:84 23/05/23 13:20:12 INFO DAGScheduler: Got job 2 (foreach at BwaMemIndexCache.java:84) with 128 output partitions 23/05/23 13:20:12 INFO DAGScheduler: Final stage: ResultStage 8 (foreach at BwaMemIndexCache.java:84) 23/05/23 13:20:12 INFO DAGScheduler: Parents of final stage: List() 23/05/23 13:20:12 INFO DAGScheduler: Missing parents: List() 23/05/23 13:20:12 INFO DAGScheduler: Submitting ResultStage 8 (ParallelCollectionRDD[40] at parallelize at BwaMemIndexCache.java:84), which has no missing parents 23/05/23 13:20:12 INFO MemoryStore: Block broadcast_7 stored as values in memory (estimated size 2.4 KB, free 399.8 GB) 23/05/23 13:20:12 INFO MemoryStore: Block broadcast_7_piece0 stored as bytes in memory (estimated size 1555.0 B, free 399.8 GB) 23/05/23 13:20:12 INFO BlockManagerInfo: Added broadcast_7_piece0 in memory on d01.capitalbiotech.local:41352 (size: 1555.0 B, free: 399.8 GB) 23/05/23 13:20:12 INFO SparkContext: Created broadcast 7 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:12 INFO DAGScheduler: Submitting 128 missing tasks from ResultStage 8 (ParallelCollectionRDD[40] at parallelize at BwaMemIndexCache.java:84) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:12 INFO TaskSchedulerImpl: Adding task set 8.0 with 128 tasks 23/05/23 13:20:12 INFO TaskSetManager: Starting task 0.0 in stage 8.0 (TID 1220, localhost, executor driver, partition 0, PROCESS_LOCAL, 7723 bytes) ... 23/05/23 13:20:12 INFO TaskSetManager: Starting task 127.0 in stage 8.0 (TID 1347, localhost, executor driver, partition 127, PROCESS_LOCAL, 7724 bytes) 23/05/23 13:20:12 INFO Executor: Running task 0.0 in stage 8.0 (TID 1220) ... 23/05/23 13:20:12 INFO Executor: Finished task 95.0 in stage 8.0 (TID 1315). 624 bytes result sent to driver 23/05/23 13:20:12 INFO TaskSetManager: Finished task 95.0 in stage 8.0 (TID 1315) in 109 ms on localhost (executor driver) (1/128) ... 23/05/23 13:20:12 INFO TaskSetManager: Finished task 4.0 in stage 8.0 (TID 1224) in 369 ms on localhost (executor driver) (128/128) 23/05/23 13:20:12 INFO TaskSchedulerImpl: Removed TaskSet 8.0, whose tasks have all completed, from pool 23/05/23 13:20:12 INFO DAGScheduler: ResultStage 8 (foreach at BwaMemIndexCache.java:84) finished in 0.401 s 23/05/23 13:20:12 INFO DAGScheduler: Job 2 finished: foreach at BwaMemIndexCache.java:84, took 0.404961 s 23/05/23 13:20:12 INFO SparkContext: Starting job: foreach at ContainsKmerReadFilterSpark.java:46 23/05/23 13:20:12 INFO DAGScheduler: Got job 3 (foreach at ContainsKmerReadFilterSpark.java:46) with 128 output partitions 23/05/23 13:20:12 INFO DAGScheduler: Final stage: ResultStage 9 (foreach at ContainsKmerReadFilterSpark.java:46) 23/05/23 13:20:12 INFO DAGScheduler: Parents of final stage: List() 23/05/23 13:20:12 INFO DAGScheduler: Missing parents: List() 23/05/23 13:20:12 INFO DAGScheduler: Submitting ResultStage 9 (ParallelCollectionRDD[41] at parallelize at ContainsKmerReadFilterSpark.java:46), which has no missing parents 23/05/23 13:20:12 INFO MemoryStore: Block broadcast_8 stored as values in memory (estimated size 2.5 KB, free 399.8 GB) 23/05/23 13:20:12 INFO MemoryStore: Block broadcast_8_piece0 stored as bytes in memory (estimated size 1606.0 B, free 399.8 GB) 23/05/23 13:20:12 INFO BlockManagerInfo: Added broadcast_8_piece0 in memory on d01.capitalbiotech.local:41352 (size: 1606.0 B, free: 399.8 GB) 23/05/23 13:20:12 INFO SparkContext: Created broadcast 8 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:12 INFO DAGScheduler: Submitting 128 missing tasks from ResultStage 9 (ParallelCollectionRDD[41] at parallelize at ContainsKmerReadFilterSpark.java:46) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:12 INFO TaskSchedulerImpl: Adding task set 9.0 with 128 tasks 23/05/23 13:20:12 INFO TaskSetManager: Starting task 0.0 in stage 9.0 (TID 1348, localhost, executor driver, partition 0, PROCESS_LOCAL, 7723 bytes) ... 23/05/23 13:20:12 INFO TaskSetManager: Starting task 127.0 in stage 9.0 (TID 1475, localhost, executor driver, partition 127, PROCESS_LOCAL, 7724 bytes) 23/05/23 13:20:12 INFO Executor: Running task 0.0 in stage 9.0 (TID 1348) ... 23/05/23 13:20:12 INFO Executor: Running task 108.0 in stage 9.0 (TID 1456) 23/05/23 13:20:12 INFO Executor: Finished task 24.0 in stage 9.0 (TID 1372). 624 bytes result sent to driver 23/05/23 13:20:12 INFO TaskSetManager: Finished task 24.0 in stage 9.0 (TID 1372) in 262 ms on localhost (executor driver) (1/128) ... 23/05/23 13:20:13 INFO Executor: Finished task 79.0 in stage 9.0 (TID 1427). 667 bytes result sent to driver 23/05/23 13:20:13 INFO TaskSetManager: Finished task 79.0 in stage 9.0 (TID 1427) in 173 ms on localhost (executor driver) (128/128) 23/05/23 13:20:13 INFO TaskSchedulerImpl: Removed TaskSet 9.0, whose tasks have all completed, from pool 23/05/23 13:20:13 INFO DAGScheduler: ResultStage 9 (foreach at ContainsKmerReadFilterSpark.java:46) finished in 0.394 s 23/05/23 13:20:13 INFO DAGScheduler: Job 3 finished: foreach at ContainsKmerReadFilterSpark.java:46, took 0.396650 s 23/05/23 13:20:13 INFO MemoryStore: Block broadcast_9 stored as values in memory (estimated size 53.8 MB, free 399.8 GB) 23/05/23 13:20:13 INFO MemoryStore: Block broadcast_9_piece0 stored as bytes in memory (estimated size 1567.0 KB, free 399.8 GB) 23/05/23 13:20:13 INFO BlockManagerInfo: Added broadcast_9_piece0 in memory on d01.capitalbiotech.local:41352 (size: 1567.0 KB, free: 399.8 GB) 23/05/23 13:20:13 INFO SparkContext: Created broadcast 9 from broadcast at PathSeqPipelineSpark.java:261 23/05/23 13:20:13 INFO MemoryStore: Block broadcast_10 stored as values in memory (estimated size 15.3 MB, free 399.8 GB) 23/05/23 13:20:13 INFO MemoryStore: Block broadcast_10_piece0 stored as bytes in memory (estimated size 1285.2 KB, free 399.8 GB) 23/05/23 13:20:13 INFO BlockManagerInfo: Added broadcast_10_piece0 in memory on d01.capitalbiotech.local:41352 (size: 1285.2 KB, free: 399.8 GB) 23/05/23 13:20:13 INFO SparkContext: Created broadcast 10 from broadcast at PSScorer.java:49 23/05/23 13:20:13 INFO SparkContext: Starting job: collectAsMap at PSScorer.java:71 23/05/23 13:20:13 INFO DAGScheduler: Registering RDD 43 (repartition at PathSeqPipelineSpark.java:197) as input to shuffle 3 23/05/23 13:20:13 INFO DAGScheduler: Registering RDD 48 (repartition at PathSeqPipelineSpark.java:256) as input to shuffle 4 23/05/23 13:20:13 INFO DAGScheduler: Registering RDD 60 (mapPartitionsToPair at PSScorer.java:68) as input to shuffle 5 23/05/23 13:20:13 INFO DAGScheduler: Got job 4 (collectAsMap at PSScorer.java:71) with 2 output partitions 23/05/23 13:20:13 INFO DAGScheduler: Final stage: ResultStage 16 (collectAsMap at PSScorer.java:71) 23/05/23 13:20:13 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 15) 23/05/23 13:20:13 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 15) 23/05/23 13:20:13 INFO DAGScheduler: Submitting ShuffleMapStage 13 (MapPartitionsRDD[43] at repartition at PathSeqPipelineSpark.java:197), which has no missing parents 23/05/23 13:20:13 INFO MemoryStore: Block broadcast_11 stored as values in memory (estimated size 8.3 KB, free 399.8 GB) 23/05/23 13:20:13 INFO MemoryStore: Block broadcast_11_piece0 stored as bytes in memory (estimated size 4.4 KB, free 399.8 GB) 23/05/23 13:20:13 INFO BlockManagerInfo: Added broadcast_11_piece0 in memory on d01.capitalbiotech.local:41352 (size: 4.4 KB, free: 399.8 GB) 23/05/23 13:20:13 INFO SparkContext: Created broadcast 11 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:13 INFO DAGScheduler: Submitting 244 missing tasks from ShuffleMapStage 13 (MapPartitionsRDD[43] at repartition at PathSeqPipelineSpark.java:197) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:13 INFO TaskSchedulerImpl: Adding task set 13.0 with 244 tasks 23/05/23 13:20:13 INFO DAGScheduler: Submitting ShuffleMapStage 14 (MapPartitionsRDD[48] at repartition at PathSeqPipelineSpark.java:256), which has no missing parents 23/05/23 13:20:13 INFO TaskSetManager: Starting task 0.0 in stage 13.0 (TID 1476, localhost, executor driver, partition 0, PROCESS_LOCAL, 7651 bytes) ... 23/05/23 13:20:13 INFO TaskSetManager: Starting task 127.0 in stage 13.0 (TID 1603, localhost, executor driver, partition 127, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:13 INFO Executor: Running task 1.0 in stage 13.0 (TID 1477) ... 23/05/23 13:20:13 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 5 ms 23/05/23 13:20:13 INFO BlockManagerInfo: Added broadcast_12_piece0 in memory on d01.capitalbiotech.local:41352 (size: 3.5 KB, free: 399.8 GB) 23/05/23 13:20:13 INFO SparkContext: Created broadcast 12 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:13 INFO DAGScheduler: Submitting 244 missing tasks from ShuffleMapStage 14 (MapPartitionsRDD[48] at repartition at PathSeqPipelineSpark.java:256) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:13 INFO TaskSchedulerImpl: Adding task set 14.0 with 244 tasks 23/05/23 13:20:14 INFO Executor: Finished task 59.0 in stage 13.0 (TID 1535). 1010 bytes result sent to driver 23/05/23 13:20:14 INFO TaskSetManager: Starting task 128.0 in stage 13.0 (TID 1604, localhost, executor driver, partition 128, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:14 INFO Executor: Running task 128.0 in stage 13.0 (TID 1604) 23/05/23 13:20:14 INFO TaskSetManager: Finished task 59.0 in stage 13.0 (TID 1535) in 308 ms on localhost (executor driver) (1/244) 23/05/23 13:20:14 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:14 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:14 INFO Executor: Finished task 41.0 in stage 13.0 (TID 1517). 1010 bytes result sent to driver ... 23/05/23 13:20:15 INFO TaskSetManager: Starting task 123.0 in stage 14.0 (TID 1843, localhost, executor driver, partition 123, PROCESS_LOCAL, 7651 bytes) 23/05/23 13:20:15 INFO TaskSetManager: Finished task 118.0 in stage 14.0 (TID 1838) in 30 ms on localhost (executor driver) (4/244) 23/05/23 13:20:15 INFO Executor: Running task 123.0 in stage 14.0 (TID 1843) 23/05/23 13:20:15 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:15 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms ... 23/05/23 13:20:16 INFO Executor: Finished task 236.0 in stage 13.0 (TID 1712). 967 bytes result sent to driver 23/05/23 13:20:16 INFO TaskSetManager: Finished task 236.0 in stage 13.0 (TID 1712) in 2140 ms on localhost (executor driver) (244/244) 23/05/23 13:20:16 INFO TaskSchedulerImpl: Removed TaskSet 13.0, whose tasks have all completed, from pool 23/05/23 13:20:16 INFO DAGScheduler: ShuffleMapStage 13 (repartition at PathSeqPipelineSpark.java:197) finished in 3.179 s 23/05/23 13:20:16 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:16 INFO DAGScheduler: running: Set(ShuffleMapStage 14) 23/05/23 13:20:16 INFO DAGScheduler: waiting: Set(ShuffleMapStage 15, ResultStage 16) 23/05/23 13:20:16 INFO DAGScheduler: failed: Set() 23/05/23 13:20:16 INFO Executor: Finished task 243.0 in stage 14.0 (TID 1963). 1010 bytes result sent to driver 23/05/23 13:20:16 INFO TaskSetManager: Finished task 243.0 in stage 14.0 (TID 1963) in 49 ms on localhost (executor driver) (124/244) 23/05/23 13:20:16 INFO Executor: Finished task 242.0 in stage 14.0 (TID 1962). 1010 bytes result sent to driver ... 23/05/23 13:20:18 INFO Executor: Finished task 120.0 in stage 14.0 (TID 1840). 967 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 120.0 in stage 14.0 (TID 1840) in 2438 ms on localhost (executor driver) (244/244) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 14.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ShuffleMapStage 14 (repartition at PathSeqPipelineSpark.java:256) finished in 4.303 s 23/05/23 13:20:18 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:18 INFO DAGScheduler: running: Set() 23/05/23 13:20:18 INFO DAGScheduler: waiting: Set(ShuffleMapStage 15, ResultStage 16) 23/05/23 13:20:18 INFO DAGScheduler: failed: Set() 23/05/23 13:20:18 INFO DAGScheduler: Submitting ShuffleMapStage 15 (MapPartitionsRDD[60] at mapPartitionsToPair at PSScorer.java:68), which has no missing parents 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_13 stored as values in memory (estimated size 12.4 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_13_piece0 stored as bytes in memory (estimated size 6.4 KB, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_13_piece0 in memory on d01.capitalbiotech.local:41352 (size: 6.4 KB, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 13 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:18 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 15 (MapPartitionsRDD[60] at mapPartitionsToPair at PSScorer.java:68) (first 15 tasks are for partitions Vector(0, 1)) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Adding task set 15.0 with 2 tasks 23/05/23 13:20:18 INFO TaskSetManager: Starting task 0.0 in stage 15.0 (TID 1964, localhost, executor driver, partition 0, PROCESS_LOCAL, 8036 bytes) 23/05/23 13:20:18 INFO TaskSetManager: Starting task 1.0 in stage 15.0 (TID 1965, localhost, executor driver, partition 1, PROCESS_LOCAL, 8036 bytes) 23/05/23 13:20:18 INFO Executor: Running task 0.0 in stage 15.0 (TID 1964) 23/05/23 13:20:18 INFO Executor: Running task 1.0 in stage 15.0 (TID 1965) 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 1 ms 23/05/23 13:20:18 INFO MemoryStore: Block rdd_53_0 stored as values in memory (estimated size 0.0 B, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block rdd_52_0 stored as values in memory (estimated size 0.0 B, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added rdd_52_0 in memory on d01.capitalbiotech.local:41352 (size: 0.0 B, free: 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added rdd_53_0 in memory on d01.capitalbiotech.local:41352 (size: 0.0 B, free: 399.8 GB) 23/05/23 13:20:18 INFO Executor: Finished task 0.0 in stage 15.0 (TID 1964). 1226 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 0.0 in stage 15.0 (TID 1964) in 147 ms on localhost (executor driver) (1/2) 23/05/23 13:20:18 INFO Executor: Finished task 1.0 in stage 15.0 (TID 1965). 1183 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 1.0 in stage 15.0 (TID 1965) in 146 ms on localhost (executor driver) (2/2) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 15.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ShuffleMapStage 15 (mapPartitionsToPair at PSScorer.java:68) finished in 0.185 s 23/05/23 13:20:18 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:18 INFO DAGScheduler: running: Set() 23/05/23 13:20:18 INFO DAGScheduler: waiting: Set(ResultStage 16) 23/05/23 13:20:18 INFO DAGScheduler: failed: Set() 23/05/23 13:20:18 INFO DAGScheduler: Submitting ResultStage 16 (ShuffledRDD[61] at reduceByKey at PSScorer.java:71), which has no missing parents 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_14 stored as values in memory (estimated size 4.7 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_14_piece0 stored as bytes in memory (estimated size 2.6 KB, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_14_piece0 in memory on d01.capitalbiotech.local:41352 (size: 2.6 KB, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 14 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:18 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 16 (ShuffledRDD[61] at reduceByKey at PSScorer.java:71) (first 15 tasks are for partitions Vector(0, 1)) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Adding task set 16.0 with 2 tasks 23/05/23 13:20:18 INFO TaskSetManager: Starting task 0.0 in stage 16.0 (TID 1966, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:18 INFO TaskSetManager: Starting task 1.0 in stage 16.0 (TID 1967, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:18 INFO Executor: Running task 0.0 in stage 16.0 (TID 1966) 23/05/23 13:20:18 INFO Executor: Running task 1.0 in stage 16.0 (TID 1967) 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:18 INFO Executor: Finished task 1.0 in stage 16.0 (TID 1967). 1098 bytes result sent to driver 23/05/23 13:20:18 INFO Executor: Finished task 0.0 in stage 16.0 (TID 1966). 1098 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 1.0 in stage 16.0 (TID 1967) in 23 ms on localhost (executor driver) (1/2) 23/05/23 13:20:18 INFO TaskSetManager: Finished task 0.0 in stage 16.0 (TID 1966) in 24 ms on localhost (executor driver) (2/2) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 16.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ResultStage 16 (collectAsMap at PSScorer.java:71) finished in 0.034 s 23/05/23 13:20:18 INFO DAGScheduler: Job 4 finished: collectAsMap at PSScorer.java:71, took 4.580518 s 23/05/23 13:20:18 INFO SparkContext: Starting job: collect at PSBwaUtils.java:59 23/05/23 13:20:18 INFO DAGScheduler: Registering RDD 63 (distinct at PSBwaUtils.java:59) as input to shuffle 6 23/05/23 13:20:18 INFO DAGScheduler: Got job 5 (collect at PSBwaUtils.java:59) with 2 output partitions 23/05/23 13:20:18 INFO DAGScheduler: Final stage: ResultStage 23 (collect at PSBwaUtils.java:59) 23/05/23 13:20:18 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 22) 23/05/23 13:20:18 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 22) 23/05/23 13:20:18 INFO DAGScheduler: Submitting ShuffleMapStage 22 (MapPartitionsRDD[63] at distinct at PSBwaUtils.java:59), which has no missing parents 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_15 stored as values in memory (estimated size 11.8 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_15_piece0 stored as bytes in memory (estimated size 6.2 KB, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_15_piece0 in memory on d01.capitalbiotech.local:41352 (size: 6.2 KB, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 15 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:18 INFO DAGScheduler: Submitting 2 missing tasks from ShuffleMapStage 22 (MapPartitionsRDD[63] at distinct at PSBwaUtils.java:59) (first 15 tasks are for partitions Vector(0, 1)) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Adding task set 22.0 with 2 tasks 23/05/23 13:20:18 INFO TaskSetManager: Starting task 0.0 in stage 22.0 (TID 1968, localhost, executor driver, partition 0, PROCESS_LOCAL, 8036 bytes) 23/05/23 13:20:18 INFO TaskSetManager: Starting task 1.0 in stage 22.0 (TID 1969, localhost, executor driver, partition 1, PROCESS_LOCAL, 8036 bytes) 23/05/23 13:20:18 INFO Executor: Running task 1.0 in stage 22.0 (TID 1969) 23/05/23 13:20:18 INFO Executor: Running task 0.0 in stage 22.0 (TID 1968) 23/05/23 13:20:18 INFO BlockManager: Found block rdd_53_0 locally 23/05/23 13:20:18 INFO BlockManager: Found block rdd_52_0 locally 23/05/23 13:20:18 INFO Executor: Finished task 1.0 in stage 22.0 (TID 1969). 925 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 1.0 in stage 22.0 (TID 1969) in 31 ms on localhost (executor driver) (1/2) 23/05/23 13:20:18 INFO Executor: Finished task 0.0 in stage 22.0 (TID 1968). 925 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 0.0 in stage 22.0 (TID 1968) in 38 ms on localhost (executor driver) (2/2) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 22.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ShuffleMapStage 22 (distinct at PSBwaUtils.java:59) finished in 0.053 s 23/05/23 13:20:18 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:18 INFO DAGScheduler: running: Set() 23/05/23 13:20:18 INFO DAGScheduler: waiting: Set(ResultStage 23) 23/05/23 13:20:18 INFO DAGScheduler: failed: Set() 23/05/23 13:20:18 INFO DAGScheduler: Submitting ResultStage 23 (MapPartitionsRDD[65] at distinct at PSBwaUtils.java:59), which has no missing parents 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_16 stored as values in memory (estimated size 4.3 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_16_piece0 stored as bytes in memory (estimated size 2.5 KB, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_16_piece0 in memory on d01.capitalbiotech.local:41352 (size: 2.5 KB, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 16 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:18 INFO DAGScheduler: Submitting 2 missing tasks from ResultStage 23 (MapPartitionsRDD[65] at distinct at PSBwaUtils.java:59) (first 15 tasks are for partitions Vector(0, 1)) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Adding task set 23.0 with 2 tasks 23/05/23 13:20:18 INFO TaskSetManager: Starting task 0.0 in stage 23.0 (TID 1970, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:18 INFO TaskSetManager: Starting task 1.0 in stage 23.0 (TID 1971, localhost, executor driver, partition 1, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:18 INFO Executor: Running task 0.0 in stage 23.0 (TID 1970) 23/05/23 13:20:18 INFO Executor: Running task 1.0 in stage 23.0 (TID 1971) 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:18 INFO Executor: Finished task 1.0 in stage 23.0 (TID 1971). 1098 bytes result sent to driver 23/05/23 13:20:18 INFO Executor: Finished task 0.0 in stage 23.0 (TID 1970). 1098 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 1.0 in stage 23.0 (TID 1971) in 14 ms on localhost (executor driver) (1/2) 23/05/23 13:20:18 INFO TaskSetManager: Finished task 0.0 in stage 23.0 (TID 1970) in 15 ms on localhost (executor driver) (2/2) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 23.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ResultStage 23 (collect at PSBwaUtils.java:59) finished in 0.026 s 23/05/23 13:20:18 INFO DAGScheduler: Job 5 finished: collect at PSBwaUtils.java:59, took 0.091434 s 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_17 stored as values in memory (estimated size 7.3 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_17_piece0 stored as bytes in memory (estimated size 679.0 B, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_17_piece0 in memory on d01.capitalbiotech.local:41352 (size: 679.0 B, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 17 from broadcast at ReadsSparkSink.java:146 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_18 stored as values in memory (estimated size 7.3 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_18_piece0 stored as bytes in memory (estimated size 679.0 B, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_18_piece0 in memory on d01.capitalbiotech.local:41352 (size: 679.0 B, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 18 from broadcast at BamSink.java:76 23/05/23 13:20:18 INFO FileOutputCommitter: File Output Committer Algorithm version is 2 23/05/23 13:20:18 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 23/05/23 13:20:18 INFO SparkContext: Starting job: runJob at SparkHadoopWriter.scala:78 23/05/23 13:20:18 INFO DAGScheduler: Registering RDD 68 (mapToPair at SparkUtils.java:161) as input to shuffle 7 23/05/23 13:20:18 INFO DAGScheduler: Got job 6 (runJob at SparkHadoopWriter.scala:78) with 1 output partitions 23/05/23 13:20:18 INFO DAGScheduler: Final stage: ResultStage 30 (runJob at SparkHadoopWriter.scala:78) 23/05/23 13:20:18 INFO DAGScheduler: Parents of final stage: List(ShuffleMapStage 29) 23/05/23 13:20:18 INFO DAGScheduler: Missing parents: List(ShuffleMapStage 29) 23/05/23 13:20:18 INFO DAGScheduler: Submitting ShuffleMapStage 29 (MapPartitionsRDD[68] at mapToPair at SparkUtils.java:161), which has no missing parents 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_19 stored as values in memory (estimated size 14.8 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_19_piece0 stored as bytes in memory (estimated size 7.9 KB, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_19_piece0 in memory on d01.capitalbiotech.local:41352 (size: 7.9 KB, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 19 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:18 INFO DAGScheduler: Submitting 1 missing tasks from ShuffleMapStage 29 (MapPartitionsRDD[68] at mapToPair at SparkUtils.java:161) (first 15 tasks are for partitions Vector(0)) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Adding task set 29.0 with 1 tasks 23/05/23 13:20:18 INFO TaskSetManager: Starting task 0.0 in stage 29.0 (TID 1972, localhost, executor driver, partition 0, ANY, 8159 bytes) 23/05/23 13:20:18 INFO Executor: Running task 0.0 in stage 29.0 (TID 1972) 23/05/23 13:20:18 INFO BlockManager: Found block rdd_52_0 locally 23/05/23 13:20:18 INFO BlockManager: Found block rdd_53_0 locally 23/05/23 13:20:18 INFO Executor: Finished task 0.0 in stage 29.0 (TID 1972). 752 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 0.0 in stage 29.0 (TID 1972) in 43 ms on localhost (executor driver) (1/1) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 29.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ShuffleMapStage 29 (mapToPair at SparkUtils.java:161) finished in 0.065 s 23/05/23 13:20:18 INFO DAGScheduler: looking for newly runnable stages 23/05/23 13:20:18 INFO DAGScheduler: running: Set() 23/05/23 13:20:18 INFO DAGScheduler: waiting: Set(ResultStage 30) 23/05/23 13:20:18 INFO DAGScheduler: failed: Set() 23/05/23 13:20:18 INFO DAGScheduler: Submitting ResultStage 30 (MapPartitionsRDD[73] at mapToPair at BamSink.java:91), which has no missing parents 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_20 stored as values in memory (estimated size 91.7 KB, free 399.8 GB) 23/05/23 13:20:18 INFO MemoryStore: Block broadcast_20_piece0 stored as bytes in memory (estimated size 42.1 KB, free 399.8 GB) 23/05/23 13:20:18 INFO BlockManagerInfo: Added broadcast_20_piece0 in memory on d01.capitalbiotech.local:41352 (size: 42.1 KB, free: 399.8 GB) 23/05/23 13:20:18 INFO SparkContext: Created broadcast 20 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:18 INFO DAGScheduler: Submitting 1 missing tasks from ResultStage 30 (MapPartitionsRDD[73] at mapToPair at BamSink.java:91) (first 15 tasks are for partitions Vector(0)) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Adding task set 30.0 with 1 tasks 23/05/23 13:20:18 INFO TaskSetManager: Starting task 0.0 in stage 30.0 (TID 1973, localhost, executor driver, partition 0, PROCESS_LOCAL, 7662 bytes) 23/05/23 13:20:18 INFO Executor: Running task 0.0 in stage 30.0 (TID 1973) 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Getting 0 non-empty blocks including 0 local blocks and 0 remote blocks 23/05/23 13:20:18 INFO ShuffleBlockFetcherIterator: Started 0 remote fetches in 0 ms 23/05/23 13:20:18 INFO FileOutputCommitter: File Output Committer Algorithm version is 2 23/05/23 13:20:18 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 23/05/23 13:20:18 INFO FileOutputCommitter: File Output Committer Algorithm version is 2 23/05/23 13:20:18 INFO FileOutputCommitter: FileOutputCommitter skip cleanup _temporary folders under output directory:false, ignore cleanup failures: false 23/05/23 13:20:18 INFO FileOutputCommitter: Saved output of task 'attempt_20230523132018_0073_r_000000_0' to file:pathseq/CRC_16.pathseq.complete.bam.parts 23/05/23 13:20:18 INFO SparkHadoopMapRedUtil: attempt_20230523132018_0073_r_000000_0: Committed 23/05/23 13:20:18 INFO Executor: Finished task 0.0 in stage 30.0 (TID 1973). 1149 bytes result sent to driver 23/05/23 13:20:18 INFO TaskSetManager: Finished task 0.0 in stage 30.0 (TID 1973) in 184 ms on localhost (executor driver) (1/1) 23/05/23 13:20:18 INFO TaskSchedulerImpl: Removed TaskSet 30.0, whose tasks have all completed, from pool 23/05/23 13:20:18 INFO DAGScheduler: ResultStage 30 (runJob at SparkHadoopWriter.scala:78) finished in 0.215 s 23/05/23 13:20:18 INFO DAGScheduler: Job 6 finished: runJob at SparkHadoopWriter.scala:78, took 0.298046 s 23/05/23 13:20:19 INFO SparkHadoopWriter: Job job_20230523132018_0073 committed. 23/05/23 13:20:19 INFO HadoopFileSystemWrapper: Concatenating 2 parts to pathseq/CRC_16.pathseq.complete.bam 23/05/23 13:20:19 INFO HadoopFileSystemWrapper: Concatenating to pathseq/CRC_16.pathseq.complete.bam done 23/05/23 13:20:19 INFO IndexFileMerger: Merging .sbi files in temp directory pathseq/CRC_16.pathseq.complete.bam.parts/ to pathseq/CRC_16.pathseq.complete.bam.sbi 23/05/23 13:20:19 INFO IndexFileMerger: Done merging .sbi files 23/05/23 13:20:19 INFO IndexFileMerger: Merging .bai files in temp directory pathseq/CRC_16.pathseq.complete.bam.parts/ to pathseq/CRC_16.pathseq.complete.bam.bai 23/05/23 13:20:19 INFO IndexFileMerger: Done merging .bai files 23/05/23 13:20:19 INFO SparkContext: Starting job: foreach at BwaMemIndexCache.java:84 23/05/23 13:20:19 INFO DAGScheduler: Got job 7 (foreach at BwaMemIndexCache.java:84) with 128 output partitions 23/05/23 13:20:19 INFO DAGScheduler: Final stage: ResultStage 31 (foreach at BwaMemIndexCache.java:84) 23/05/23 13:20:19 INFO DAGScheduler: Parents of final stage: List() 23/05/23 13:20:19 INFO DAGScheduler: Missing parents: List() 23/05/23 13:20:19 INFO DAGScheduler: Submitting ResultStage 31 (ParallelCollectionRDD[74] at parallelize at BwaMemIndexCache.java:84), which has no missing parents 23/05/23 13:20:19 INFO MemoryStore: Block broadcast_21 stored as values in memory (estimated size 2.4 KB, free 399.8 GB) 23/05/23 13:20:19 INFO MemoryStore: Block broadcast_21_piece0 stored as bytes in memory (estimated size 1555.0 B, free 399.8 GB) 23/05/23 13:20:19 INFO BlockManagerInfo: Added broadcast_21_piece0 in memory on d01.capitalbiotech.local:41352 (size: 1555.0 B, free: 399.8 GB) 23/05/23 13:20:19 INFO SparkContext: Created broadcast 21 from broadcast at DAGScheduler.scala:1163 23/05/23 13:20:19 INFO DAGScheduler: Submitting 128 missing tasks from ResultStage 31 (ParallelCollectionRDD[74] at parallelize at BwaMemIndexCache.java:84) (first 15 tasks are for partitions Vector(0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)) 23/05/23 13:20:19 INFO TaskSchedulerImpl: Adding task set 31.0 with 128 tasks 23/05/23 13:20:19 INFO TaskSetManager: Starting task 0.0 in stage 31.0 (TID 1974, localhost, executor driver, partition 0, PROCESS_LOCAL, 7723 bytes) ... 23/05/23 13:20:19 INFO TaskSetManager: Starting task 127.0 in stage 31.0 (TID 2101, localhost, executor driver, partition 127, PROCESS_LOCAL, 7724 bytes) 23/05/23 13:20:19 INFO Executor: Running task 0.0 in stage 31.0 (TID 1974) ... 23/05/23 13:20:19 INFO Executor: Running task 109.0 in stage 31.0 (TID 2083) 23/05/23 13:20:19 INFO Executor: Finished task 66.0 in stage 31.0 (TID 2040). 667 bytes result sent to driver 23/05/23 13:20:19 INFO Executor: Finished task 2.0 in stage 31.0 (TID 1976). 667 bytes result sent to driver 23/05/23 13:20:19 INFO TaskSetManager: Finished task 66.0 in stage 31.0 (TID 2040) in 160 ms on localhost (executor driver) (1/128) 23/05/23 13:20:19 INFO TaskSetManager: Finished task 2.0 in stage 31.0 (TID 1976) in 330 ms on localhost (executor driver) (2/128) 23/05/23 13:20:19 INFO Executor: Finished task 3.0 in stage 31.0 (TID 1977). 667 bytes result sent to driver ... 23/05/23 13:20:19 INFO TaskSetManager: Finished task 97.0 in stage 31.0 (TID 2071) in 123 ms on localhost (executor driver) (127/128) 23/05/23 13:20:19 INFO TaskSetManager: Finished task 112.0 in stage 31.0 (TID 2086) in 88 ms on localhost (executor driver) (128/128) 23/05/23 13:20:19 INFO TaskSchedulerImpl: Removed TaskSet 31.0, whose tasks have all completed, from pool 23/05/23 13:20:19 INFO DAGScheduler: ResultStage 31 (foreach at BwaMemIndexCache.java:84) finished in 0.389 s 23/05/23 13:20:19 INFO DAGScheduler: Job 7 finished: foreach at BwaMemIndexCache.java:84, took 0.392269 s 23/05/23 13:20:19 INFO SparkUI: Stopped Spark web UI at http://d01.capitalbiotech.local:4040 23/05/23 13:20:19 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 23/05/23 13:20:26 INFO MemoryStore: MemoryStore cleared 23/05/23 13:20:26 INFO BlockManager: BlockManager stopped 23/05/23 13:20:26 INFO BlockManagerMaster: BlockManagerMaster stopped 23/05/23 13:20:26 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 23/05/23 13:20:26 INFO SparkContext: Successfully stopped SparkContext 13:20:26.099 INFO PathSeqPipelineSpark - Shutting down engine [May 23, 2023 1:20:26 PM CST] org.broadinstitute.hellbender.tools.spark.pathseq.PathSeqPipelineSpark done. Elapsed time: 1.04 minutes. Runtime.totalMemory()=156475326464 23/05/23 13:20:26 INFO ShutdownHookManager: Shutdown hook called 23/05/23 13:20:26 INFO ShutdownHookManager: Deleting directory pathseq/tmp/spark-2042a18b-a4af-4a86-a236-c4914f0407a1

hanruiw commented 1 year ago

Dear Fang,

Thank you for the questions. Our project didn't use customized GATK-PathSeq database, so I am sorry to say that I'm not able to provide assistance for this question. For technical questions related to GATK-PathSeq, I would suggest you to contact GATK-PathSeq team for help. Good luck with your analysis!

Best regards, Hanrui

februaryfang commented 1 year ago

Dear Hanrui,

PathSeqPipelineSpark is a comprehensive module, so I tried to analyze it step by step, hoping to discover the reasons for the lack of results. After analyzing PathSeqFilterSpark, I obtained a statistical file with the following results.

METRICS CLASS org.broadinstitute.hellbender.tools.spark.pathseq.loggers.PSFilterMetrics

PRIMARY_READS READS_AFTER_PREALIGNED_HOST_FILTER READS_AFTER_QUALITY_AND_COMPLEXITY_FILTER READS_AFTER_HOST_FILTER READS_AFTER_DEDUPLICATION FINAL_PAIRED_READS FINAL_UNPAIRED_READS FINAL_TOTAL_READS LOW_QUALITY_OR_LOW_COMPLEXITY_READS_FILTERED HOST_READS_FILTERED DUPLICATE_READS_FILTERED 2196465 2196465 0 0 0 0 0 0 2196465 0 0

Why are all readss marked as LOW QUALITY OR LOW COMPLEXITY READS FILTERED ? The above results are from the example code 'patient_samples_16s_pipeline.sh'. Is the threshold in the code the true threshold for literature data?

hanruiw commented 1 year ago

Dear Fang,

Have you tried the database provided by GATK-Pathseq? I'm not familiar with the custom database, and haven't tested PathSeqFilterSpark on my own. But based on my understanding, a prepared database might be helpful in the debugging process.

Best regards, Hanrui