Kennedy-Lab-UW / Duplex-Seq-Pipeline

A standalone end-to-end data analysis pipeline for Duplex Sequencing
Other
21 stars 9 forks source link

Some java steps sometimes run out of memory #110

Open bkohrn opened 1 year ago

bkohrn commented 1 year ago

Some steps sometimes run out of memory, especially on larger data sets:

fgbio ClipBam -i ../A139_D/A139_D.L_ELET20221112_2.seq20230424.sscs.clipped.bam -o ../A139_D/A139_D.L_ELET20221112_2.seq20230424.sscs.overlapClip.temp.bam -r /research/labs/kennedy/bioinf/ref/hg38/hg38.fa -c Hard --clip-overlapping-reads true -m ../A139_D/Stats/data/A139_D.L_ELET20221112_2.seq20230424.sscs.overlapClip.metrics.txt
[2023/05/02 09:58:54 | FgBioMain | Info] Executing ClipBam from fgbio version 1.3.0 as kohrnb@gattaca on JRE 11.0.8-internal+0-adhoc..src with snappy, IntelInflater, and IntelDeflater
[2023/05/02 09:58:54 | Bams | Info] Sorting into queryname order.
[2023/05/02 09:58:58 | Bams | Info] processed     1,000,000 Queryname sorted.  Elapsed time: 00:00:04s.  Time for last 1,000,000:    4s.  Last read position: chr1:26,773,450
[2023/05/02 09:59:01 | Bams | Info] processed     2,000,000 Queryname sorted.  Elapsed time: 00:00:06s.  Time for last 1,000,000:    2s.  Last read position: chr1:119,916,401
[2023/05/02 09:59:04 | Bams | Info] processed     3,000,000 Queryname sorted.  Elapsed time: 00:00:09s.  Time for last 1,000,000:    2s.  Last read position: chr1:119,941,686
[2023/05/02 09:59:07 | Bams | Info] processed     4,000,000 Queryname sorted.  Elapsed time: 00:00:12s.  Time for last 1,000,000:    2s.  Last read position: chr1:144,664,259
[2023/05/02 09:59:10 | Bams | Info] processed     5,000,000 Queryname sorted.  Elapsed time: 00:00:15s.  Time for last 1,000,000:    2s.  Last read position: chr12:49,030,336
[2023/05/02 09:59:13 | Bams | Info] processed     6,000,000 Queryname sorted.  Elapsed time: 00:00:18s.  Time for last 1,000,000:    2s.  Last read position: chr12:49,038,880
[2023/05/02 09:59:16 | Bams | Info] processed     7,000,000 Queryname sorted.  Elapsed time: 00:00:21s.  Time for last 1,000,000:    2s.  Last read position: chr12:49,046,693
[2023/05/02 09:59:18 | Bams | Info] processed     8,000,000 Queryname sorted.  Elapsed time: 00:00:24s.  Time for last 1,000,000:    2s.  Last read position: chr12:49,054,766
[2023/05/02 09:59:21 | Bams | Info] processed     9,000,000 Queryname sorted.  Elapsed time: 00:00:27s.  Time for last 1,000,000:    2s.  Last read position: chr13:48,452,532
[2023/05/02 09:59:24 | Bams | Info] processed    10,000,000 Queryname sorted.  Elapsed time: 00:00:29s.  Time for last 1,000,000:    2s.  Last read position: chr16:3,736,684
[2023/05/02 09:59:27 | Bams | Info] processed    11,000,000 Queryname sorted.  Elapsed time: 00:00:32s.  Time for last 1,000,000:    2s.  Last read position: chr16:3,782,631
[2023/05/02 09:59:30 | Bams | Info] processed    12,000,000 Queryname sorted.  Elapsed time: 00:00:35s.  Time for last 1,000,000:    2s.  Last read position: chr19:18,606,899
[2023/05/02 09:59:33 | Bams | Info] processed    13,000,000 Queryname sorted.  Elapsed time: 00:00:38s.  Time for last 1,000,000:    3s.  Last read position: chr21:10,464,845
[2023/05/02 09:59:41 | FgBioMain | Info] ClipBam failed. Elapsed time: 0.80 minutes.
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Collections.unmodifiableList(Collections.java:1289)
        at htsjdk.samtools.Cigar.getCigarElements(Cigar.java:54)
        at htsjdk.samtools.SAMUtils.getAlignmentBlocks(SAMUtils.java:720)
        at htsjdk.samtools.SAMRecord.getAlignmentBlocks(SAMRecord.java:1788)
        at htsjdk.samtools.SAMRecord.validateCigar(SAMRecord.java:1806)
        at htsjdk.samtools.BAMRecord.getCigar(BAMRecord.java:284)
        at htsjdk.samtools.SAMRecord.isValid(SAMRecord.java:2102)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.advance(BAMFileReader.java:848)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:834)
        at htsjdk.samtools.BAMFileReader$BAMFileIterator.next(BAMFileReader.java:802)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:574)
        at htsjdk.samtools.SamReader$AssertingIterator.next(SamReader.java:553)
        at com.fulcrumgenomics.commons.CommonsDef$JavaIteratorAdapter.next(CommonsDef.scala:251)
        at scala.collection.Iterator$$anon$9.next(Iterator.scala:575)
        at com.fulcrumgenomics.commons.collection.BetterBufferedIterator.maybeNext(BetterBufferedIterator.scala:45)
        at com.fulcrumgenomics.commons.collection.BetterBufferedIterator.$anonfun$next$2(BetterBufferedIterator.scala:54)
        at com.fulcrumgenomics.commons.collection.BetterBufferedIterator$$Lambda$405/0x00000001005bc040.apply$mcV$sp(Unknown Source)
        at com.fulcrumgenomics.commons.CommonsDef.yieldAndThen(CommonsDef.scala:74)
        at com.fulcrumgenomics.commons.CommonsDef.yieldAndThen$(CommonsDef.scala:72)
        at com.fulcrumgenomics.commons.CommonsDef$.yieldAndThen(CommonsDef.scala:422)
        at com.fulcrumgenomics.commons.collection.BetterBufferedIterator.next(BetterBufferedIterator.scala:54)
        at com.fulcrumgenomics.commons.collection.SelfClosingIterator.next(SelfClosingIterator.scala:46)
        at scala.collection.IterableOnceOps.foreach(IterableOnce.scala:553)
        at scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:551)
        at com.fulcrumgenomics.commons.collection.BetterBufferedIterator.foreach(BetterBufferedIterator.scala:41)
        at com.fulcrumgenomics.bam.Bams$.queryGroupedIterator(Bams.scala:215)
        at com.fulcrumgenomics.bam.Bams$.templateIterator(Bams.scala:253)
        at com.fulcrumgenomics.bam.Bams$.templateIterator(Bams.scala:236)
        at com.fulcrumgenomics.bam.ClipBam.execute(ClipBam.scala:102)
        at com.fulcrumgenomics.cmdline.FgBioMain.makeItSo(FgBioMain.scala:110)
        at com.fulcrumgenomics.cmdline.FgBioMain.makeItSoAndExit(FgBioMain.scala:86)
        at com.fulcrumgenomics.cmdline.FgBioMain$.main(FgBioMain.scala:50)
ESC[32m[Tue May  2 09:59:41 2023]ESC[0m
ESC[31mError in rule overlapClip:ESC[0m

Solution would be to make it so that memory is tunable, including based on retry number.