Open GATKSupportTeam opened 2 years ago
I think the most likely explanation for this behavior would be trying to use an unmapped source of reads. Has your input BAM been aligned to reference? I apologize that this code isn't more robust to edge conditions.
I'll check with the user, thanks @tedsharpe!
Hi @tedsharpe !
I also commented about it on the helpdesk but should probably reply directly here.
The .bam file was aligned to a reference , the same reference I used to run the tool. I was wondering If the bam still contained unmapped reads and so used
samtools view -b -F 4
on the file to retain only mapped reads and re-run the GATK tool. However this did not improve the situation.
Best, Domniki
error log:
22/03/11 06:13:57 INFO SparkUI: Stopped Spark web UI at http://10.222.0.104:4040 22/03/11 06:13:57 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! 22/03/11 06:13:58 INFO MemoryStore: MemoryStore cleared 22/03/11 06:13:58 INFO BlockManager: BlockManager stopped 22/03/11 06:13:58 INFO BlockManagerMaster: BlockManagerMaster stopped 22/03/11 06:13:58 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! 22/03/11 06:13:58 INFO SparkContext: Successfully stopped SparkContext 06:13:58.369 INFO FindBreakpointEvidenceSpark - Shutting down engine [March 11, 2022 6:13:58 AM GMT] org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark done. Elapsed time: 3.28 minutes. Runtime.totalMemory()=29312942080 java.lang.ArithmeticException: / by zero at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.removeUbiquitousKmers(FindBreakpointEvidenceSpark.java:640) at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.addAssemblyQNames(FindBreakpointEvidenceSpark.java:507) at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.gatherEvidenceAndWriteContigSamFile(FindBreakpointEvidenceSpark. at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.runTool(FindBreakpointEvidenceSpark.java:136) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:546) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289) 22/03/11 06:13:58 INFO ShutdownHookManager: Shutdown hook called 22/03/11 06:13:58 INFO ShutdownHookManager: Deleting directory /mnt/SCRATCH/domniman/tmp/spark-fe669b9e-a685-4831-b295-0e0cddb84d7b
The BAM file is partitioned into chunks (called partitions) by Spark. The section of code that is failing is attempting to calculate the median number of bases covered in each partition. It calculates the number of bases covered for each partition, sorts the list, and grabs the middle entry. The code is not very defensive, and sadly, this number is 0 for your BAM. This could be because there are no partitions (I don't know why this might be), or because more than half of the partitions cover no bases (if, e.g., there are no mapped reads in the partition).
Do you have a BAM that runs properly in the tool? You could try the Picard tool ValidateBAM or some tool that gathers mapping statistics (maybe samtools?) and compare the statistics on the good BAM as well as on the failing BAM. Maybe you'll spot some glaring distinction.
The failing BAM is aligned and coordinate sorted, right? And the majority of the reads are mapped?
Is it a tiny BAM? Maybe we're creating too many partitions and most of them are empty?
The stack trace: java.lang.ArithmeticException: / by zero at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.removeUbiquitousKmers(FindBreakpointEvidenceSpark.java:640) at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.addAssemblyQNames(FindBreakpointEvidenceSpark.java:507) at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.gatherEvidenceAndWriteContigSamFile(FindBreakpointEvidenceSpark.java:176) at org.broadinstitute.hellbender.tools.spark.sv.evidence.FindBreakpointEvidenceSpark.runTool(FindBreakpointEvidenceSpark.java:136) at org.broadinstitute.hellbender.engine.spark.GATKSparkTool.runPipeline(GATKSparkTool.java:546) at org.broadinstitute.hellbender.engine.spark.SparkCommandLineProgram.doWork(SparkCommandLineProgram.java:31) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.runTool(CommandLineProgram.java:140) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMainPostParseArgs(CommandLineProgram.java:192) at org.broadinstitute.hellbender.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:211) at org.broadinstitute.hellbender.Main.runCommandLineProgram(Main.java:160) at org.broadinstitute.hellbender.Main.mainEntry(Main.java:203) at org.broadinstitute.hellbender.Main.main(Main.java:289)
This request was created from a contribution made by Domniki Manousi on March 07, 2022 12:01 UTC.
Link: https://gatk.broadinstitute.org/hc/en-us/community/posts/4556136866843-FindBreakpointEvidenceSpark-sudden-shutdown
--
Hi, I am trying to run the tool FindBreakpointEvidenceSpark. I have successfully produced the required kmers and the tool seems to run for several minutes until it finally stops without producing output.
I have read in past issues that memory usage might be a problem and have tried to accomodate for it using the -Xmx option.
a) GATK version used: gatk4: 4.2.0.0 through singularity (/cvmfs/singularity.galaxyproject.org/all/gatk4:4.2.0.0--0)
b) Exact command used:
singularity exec /cvmfs/singularity.galaxyproject.org/all/gatk4:4.2.0.0--0 gatk --java-options "-Xmx75g -DGATK_STACKTRACE_ON_USER_EXCEPTION=true" FindBreakpointEvidenceSpark \
-R /mnt/SCRATCH/domniman/references/ssa_selected/Simon_Final2021_Ssa_selected.fa -I /mnt/SCRATCH/domniman/2014G_NO_Males_1169_D03_RG.bam \
--aligner-index-image /mnt/SCRATCH/domniman/references/ssa_selected/Simon_Final2021_Ssa_selected.fa.img \
--kmers-to-ignore /mnt/users/domniman/ag_fish/kmers_to_ignore.txt -O /mnt/SCRATCH/domniman/assembly.sam \
--tmp-dir /mnt/SCRATCH/domniman/tmp -L ssa03
Entire error log:
Due to length of the complete log (37.671 lines) I attach it as a separate link: https://www.dropbox.com/s/n7q5dco4z5t3moz/gatk%20error%20log%20.txt?dl=0
Best,
Domniki
(created from Zendesk ticket #275546)
gz#275546