bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space #61

Closed nityendra21 closed 8 months ago

nityendra21 commented 8 months ago

Hello, I am getting a out of memory error when trying to assemble my reads, I am using 80 threads and 900GB of RAM to assembly. There are a total of 4,389,293 reads in my FASTQ file.

Version info: RNA-Bloom v2.0.1 openjdk 20-internal 2023-03-21

The command I used is: rnabloom -long assembly/cleaned.reads.fastq -t 80 -mem 900 -o assembly/bloom

Log:

> Stage 3: Assemble long reads for "rnabloom"
Overlapping sequences...
Parsed 2,964,907,952 overlap records in 1d 0h 19m 53s
total reads:    4,324,473
 - unique:  1,868,715   (43.2 %)
   - multi-seg: 1,477,350
Unique reads extracted in 10m 30s
Overlapping sequences...
Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.Arrays.copyOfRange(Arrays.java:3822)
        at java.base/java.lang.StringLatin1.newString(StringLatin1.java:763)
        at java.base/java.lang.String.substring(String.java:2725)
        at java.base/java.lang.String.subSequence(String.java:2758)
        at java.base/java.util.regex.Matcher.getSubSequence(Matcher.java:1789)
        at java.base/java.util.regex.Matcher.group(Matcher.java:661)
        at rnabloom.util.PafUtils.hasGoodAlignment(PafUtils.java:83)
        at rnabloom.olc.Layout.hasGoodAlignment(Layout.java:149)
        at rnabloom.olc.Layout.populateGraphFromOverlaps(Layout.java:3028)
        at rnabloom.olc.Layout.extractSimplePaths(Layout.java:3350)
        at rnabloom.olc.OverlapLayoutConsensus.layoutSimple(OverlapLayoutConsensus.java:824)
        at rnabloom.olc.OverlapLayoutConsensus.overlapWithMinimapAndLayoutSimple(OverlapLayoutConsensus.java:545)
        at rnabloom.olc.OverlapLayoutConsensus.uniqueOLC(OverlapLayoutConsensus.java:1180)
        at rnabloom.RNABloom.assembleUnclusteredLongReads(RNABloom.java:3314)
        at rnabloom.RNABloom.main(RNABloom.java:7430)
dvirdi01 commented 8 months ago

Hi, I was having the same issue while assembling my long-read files today.

It gives me the following errors for some files:

> Stage 3: Assemble long reads for "rnabloom"
Overlapping sequences...
Parsed 10,196,100 overlap records in 16m 1s
total reads:    1,947,378
 - unique:      356,315 (18.3 %)
   - multi-seg: 48,497
Unique reads extracted in 29.242s
ERROR: Error extracting unique reads!
ERROR: Error assembling long reads!

AND this for other files

Overlapping sequences...
ERROR: Index 3 out of bounds for length 3
java.lang.ArrayIndexOutOfBoundsException: Index 3 out of bounds for length 3
        at rnabloom.io.PafRecord.update(PafRecord.java:34)
        at rnabloom.io.ExtendedPafRecord.update(ExtendedPafRecord.java:32)
        at rnabloom.io.PafReader.next(PafReader.java:63)
        at rnabloom.olc.Layout.extractUniqueFromOverlaps(Layout.java:1660)
        at rnabloom.olc.OverlapLayoutConsensus.overlapWithMinimapAndExtractUnique(OverlapLayoutConsensus.java:156)
        at rnabloom.olc.OverlapLayoutConsensus.uniqueOLC(OverlapLayoutConsensus.java:1159)
        at rnabloom.RNABloom.assembleUnclusteredLongReads(RNABloom.java:3314)
        at rnabloom.RNABloom.main(RNABloom.java:7430)
kmnip commented 8 months ago

Hi @nityendra21 ,

Regarding your command:

rnabloom -long assembly/cleaned.reads.fastq -t 80 -mem 900 -o assembly/bloom

The -mem option is for setting the total size in GB allocated for the Bloom filters. You do not need to set this option. RNA-Bloom can set it for you automatically based on ntCard's output.

Regarding java.lang.OutOfMemoryError: Java heap space, you can set Java's maximum heap size with -Xmx. For example, to set it to 100GB:

export JAVA_TOOL_OPTIONS="-Xmx100g"
rnabloom -long assembly/cleaned.reads.fastq -t 80 -o assembly/bloom

If your RAM has 900GB and you set -mem 900, the Bloom filters will occupy your RAM entirely and it does not have any more memory left for computations using the Java heap space. That is the reason why you are getting the java.lang.OutOfMemoryError.

nityendra21 commented 8 months ago

Thank you for the quick reply @kmnip, I have run RNA-Bloom as suggested by you and will report back.

nityendra21 commented 8 months ago

That worked excellently. Thank you @kmnip