PapenfussLab / gridss

GRIDSS: the Genomic Rearrangement IDentification Software Suite
Other
255 stars 71 forks source link

java.lang.OutOfMemoryError: GC overhead limit exceeded #196

Closed TingHsuanChen closed 5 years ago

TingHsuanChen commented 5 years ago

Hi,

I'm trying to analyze our RNAseq data with GRIDSS. I started from a trial on single bam file with a small proportion of the reference genome (chromosome 1). The settings are as follows:

NORMAL=../hisat2_alignment_ch1/01_k100Mm_ch1.bam REFERENCE=../bwa/Vitis_genome_ch1.fasta OUTPUT=Ctrl_a_ch1.sv.vcf ASSEMBLY=${OUTPUT/.sv.vcf/.gridss.assembly.bam} GRIDSS_JAR=/home/ting-hsuan/gridss.jar java -ea -Xmx16g \ -Dsamjdk.create_index=true \ -Dsamjdk.use_async_io_read_samtools=true \ -Dsamjdk.use_async_io_write_samtools=true \ -Dsamjdk.use_async_io_write_tribble=true \ -Dsamjdk.compression_level=1 \ -Dgridss.gridss.output_to_temp_file=true \ -Dgridss.defensiveGC=true \ -cp $GRIDSS_JAR gridss.CallVariants \ TMP_DIR=. \ WORKING_DIR=. \ CONFIGURATION_FILE=gridss.properties \ REFERENCE_SEQUENCE="$REFERENCE" \ INPUT="$NORMAL" \ OUTPUT="$OUTPUT" \ ASSEMBLY="$ASSEMBLY" \ WORKER_THREADS=1

It ran smoothly at the beginning, then came with this error message:

INFO 2019-02-15 16:45:14 AssemblyEvidenceSource Starting assembly on chunk 0 (chr1:1-chr1:23037639) INFO 2019-02-15 17:25:24 AssemblyEvidenceSource Completed assembly on chunk 0 (chr1:1-chr1:23037639) in 2410s (40.17 min) INFO 2019-02-15 17:25:24 AssemblyEvidenceSource Breakend assembly complete. ERROR 2019-02-15 17:25:24 AssemblyEvidenceSource Fatal error during assembly java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at au.edu.wehi.idsv.AssemblyEvidenceSource.runTasks(AssemblyEvidenceSource.java:159) at au.edu.wehi.idsv.AssemblyEvidenceSource.assembleBreakends(AssemblyEvidenceSource.java:97) at gridss.CallVariants.doWork(CallVariants.java:127) at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:195) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:277) at gridss.CallVariants.main(CallVariants.java:109) Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at au.edu.wehi.idsv.debruijn.positional.MemoizedTraverse.memoize_fastutil_sortedmap(MemoizedTraverse.java:298) at au.edu.wehi.idsv.debruijn.positional.MemoizedTraverse.memoize(MemoizedTraverse.java:254) at au.edu.wehi.idsv.debruijn.positional.MemoizedContigCaller.visit(MemoizedContigCaller.java:219) at au.edu.wehi.idsv.debruijn.positional.MemoizedContigCaller.advanceFrontier(MemoizedContigCaller.java:189) at au.edu.wehi.idsv.debruijn.positional.MemoizedContigCaller.frontierPath(MemoizedContigCaller.java:373) at au.edu.wehi.idsv.debruijn.positional.NonReferenceContigAssembler.removeMisassembledPartialContig(NonReferenceContigAssembler.java:290) at au.edu.wehi.idsv.debruijn.positional.NonReferenceContigAssembler.ensureCalledContig(NonReferenceContigAssembler.java:223) at au.edu.wehi.idsv.debruijn.positional.NonReferenceContigAssembler.hasNext(NonReferenceContigAssembler.java:162) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.flushIfRequired(PositionalAssembler.java:70) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.ensureAssembler(PositionalAssembler.java:114) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.ensureAssembler(PositionalAssembler.java:86) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.hasNext(PositionalAssembler.java:56) at au.edu.wehi.idsv.AssemblyEvidenceSource.assembleChunk(AssemblyEvidenceSource.java:232) at au.edu.wehi.idsv.AssemblyEvidenceSource.assembleChunk(AssemblyEvidenceSource.java:192) at au.edu.wehi.idsv.AssemblyEvidenceSource.lambda$assembleBreakends$1(AssemblyEvidenceSource.java:93) at au.edu.wehi.idsv.AssemblyEvidenceSource$$Lambda$23/1627857534.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) [Fri Feb 15 17:25:24 NZDT 2019] gridss.CallVariants done. Elapsed time: 40.19 minutes. Runtime.totalMemory()=15271460864 Exception in thread "main" java.lang.RuntimeException: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded at au.edu.wehi.idsv.AssemblyEvidenceSource.runTasks(AssemblyEvidenceSource.java:169) at au.edu.wehi.idsv.AssemblyEvidenceSource.assembleBreakends(AssemblyEvidenceSource.java:97) at gridss.CallVariants.doWork(CallVariants.java:127) at gridss.cmdline.MultipleSamFileCommandLineProgram.doWork(MultipleSamFileCommandLineProgram.java:195) at picard.cmdline.CommandLineProgram.instanceMain(CommandLineProgram.java:277) at gridss.CallVariants.main(CallVariants.java:109) Caused by: java.util.concurrent.ExecutionException: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:192) at au.edu.wehi.idsv.AssemblyEvidenceSource.runTasks(AssemblyEvidenceSource.java:159) ... 5 more Caused by: java.lang.OutOfMemoryError: GC overhead limit exceeded at au.edu.wehi.idsv.debruijn.positional.MemoizedTraverse.memoize_fastutil_sortedmap(MemoizedTraverse.java:298) at au.edu.wehi.idsv.debruijn.positional.MemoizedTraverse.memoize(MemoizedTraverse.java:254) at au.edu.wehi.idsv.debruijn.positional.MemoizedContigCaller.visit(MemoizedContigCaller.java:219) at au.edu.wehi.idsv.debruijn.positional.MemoizedContigCaller.advanceFrontier(MemoizedContigCaller.java:189) at au.edu.wehi.idsv.debruijn.positional.MemoizedContigCaller.frontierPath(MemoizedContigCaller.java:373) at au.edu.wehi.idsv.debruijn.positional.NonReferenceContigAssembler.removeMisassembledPartialContig(NonReferenceContigAssembler.java:290) at au.edu.wehi.idsv.debruijn.positional.NonReferenceContigAssembler.ensureCalledContig(NonReferenceContigAssembler.java:223) at au.edu.wehi.idsv.debruijn.positional.NonReferenceContigAssembler.hasNext(NonReferenceContigAssembler.java:162) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.flushIfRequired(PositionalAssembler.java:70) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.ensureAssembler(PositionalAssembler.java:114) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.ensureAssembler(PositionalAssembler.java:86) at au.edu.wehi.idsv.debruijn.positional.PositionalAssembler.hasNext(PositionalAssembler.java:56) at au.edu.wehi.idsv.AssemblyEvidenceSource.assembleChunk(AssemblyEvidenceSource.java:232) at au.edu.wehi.idsv.AssemblyEvidenceSource.assembleChunk(AssemblyEvidenceSource.java:192) at au.edu.wehi.idsv.AssemblyEvidenceSource.lambda$assembleBreakends$1(AssemblyEvidenceSource.java:93) at au.edu.wehi.idsv.AssemblyEvidenceSource$$Lambda$23/1627857534.call(Unknown Source) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)

Should I add -XX:-UseGCOverheadLimit to the java code?

Kind regards, Ting-Hsuan

d-cameron commented 5 years ago

What coverage do you have? High coverage samples will require more memory when assembling. You can give GRIDSS more memory by changing the -Xmx16g to something higher (I typically use -Xmx31g). Note that values between 32Gb and ~48Gb should not be used as they actually result in less usable memory due to java compressed oops.

d-cameron commented 5 years ago

I'm trying to analyze our RNAseq data with GRIDSS.

Note that at this time I do no recommend using GRIDSS on RNA-Seq data as only a single assembly is called from each breakend branch.

Eg: Transcript A: exon 1, 2, 3 Transcript B: exon 1, 2, 4

When assembly is performed at the exon 1 boundary, the assembly contig will contain exon 2 then exon 3, or exon 2 then exon 4, but not both. Only the most highly supported assembly graph branch will be called. It is even more problematic if exons 3 and 4 start with the same base as when assembling from exon 2, only one of the two branches will be taken (since the first kmer is shared so they are treated as the same branch).

This will be fixed in a future revision of the GRIDSS assembler but for the moment I cannot in good conscience recommend it for RNA-Seq junction calling as some well-supported junctions will be missing assembly support.