Open A-pcd opened 1 month ago
I think your machine did not have enough memory to overlap the reads with minimap2 in stage 3.
Also, I don't think PacBio Hifi reads are strand-specific. So, you shouldn't use the -strand
option, and you actually specified the option twice. And, I would not change the k-mer size either because the Bloom filter false-positive rates are very high in stages 1 and 2.
So, please revise your command for RNA-Bloom to:
rnabloom -lrpb -long Denovo.fa -t 5 -outdir work/DEC1S
Please remove the output files and try the assembly again.
It do not even pass stage 1 when the k-mers size is not specified, it actually do not work with any k-mers above 15
RNA-Bloom v2.0.1 args: [-lrpb, -long, Denovo.fa, -t, 5, -outdir, /work/ /DEC1]
name: rnabloom
outdir: /work/DEC1
WARNING: Output directory does not exist!
Created output directory at /work/ DEC1
Turning on option -ntcard
to count k-mers
K-mer counting with ntCard...
Running command: ntcard -t 5 -k 35 -c 65535 -p /work/ DEC1/rnabloom @/work/DEC1/rnabloom.ntcard.readslist.txt
...
Parsing histogram file /work/DEC1/rnabloom_k35.hist
...
Unique k-mers (k=35): 27,673,140,848
Unique k-mers (k=35,c>1): 1,061,710,819
K-mer counting completed in 6m 43s
Total: 79.923164
Stage 1: Construct graph from reads (k=35) Exception in thread "main" java.lang.OutOfMemoryError: Unable to allocate 20153865520 bytes at java.base/jdk.internal.misc.Unsafe.allocateMemory(Unsafe.java:632) at jdk.unsupported/sun.misc.Unsafe.allocateMemory(Unsafe.java:462) at rnabloom.bloom.buffer.UnsafeByteBuffer.
(UnsafeByteBuffer.java:44) at rnabloom.bloom.CountingBloomFilter. (CountingBloomFilter.java:52) at rnabloom.graph.BloomFilterDeBruijnGraph. (BloomFilterDeBruijnGraph.java:98) at rnabloom.RNABloom.initializeGraph(RNABloom.java:994) at rnabloom.RNABloom.main(RNABloom.java:7126)
Exception in thread "main" java.lang.OutOfMemoryError: Unable to allocate 20153865520 bytes
Your machine does not have enough memory to do the assembly.
The previous assembly ran further with a smaller k-mer size (i.e. 15) because the Bloom filters required less memory. However, k=15 was not the appropriate k-mer size to use. Please use the parameters I recommended on a machine with more RAM. For your dataset, it would probably need ~100GB RAM.
I have an RNA seq that I need to assemble from PacBio and when I send the script the program do not pass the stage 2. I work with PACbio long read Hifi sequence that end up being all around 12 millions reads in Java openjdk version "22.0.1-internal" 2024-04-16. Here is the script I’m sending :
rnabloom -lrpb -stranded -long Denovo.fa -t 5 -stranded -outdir work/DEC1S -k 15 RNA-Bloom v2.0.1 args: [-lrpb, -long, Denovo.fa, -stranded, -t, 5, -outdir, /work /DEC1S, -k, 15]
name: rnabloom outdir: /work/DEC1S
Turning on option
-ntcard
to count k-mers K-mer counting with ntCard... Parsing histogram file/work/ DEC1S/rnabloom_k15.hist
... Unique k-mers (k=15): 533,165,901 Unique k-mers (k=15,c>1): 529,810,636 K-mer counting completed in 0.156sBloom filters Memory (GB)
de Bruijn graph: 1.1782151 k-mer counting: 9.366405
Total: 10.54462
When I look at the 13 files that comes out:
LONGREADS.CORRECTED rnabloom.longreads.assembly1.nr.fa.gz.log rnabloom.longreads.corrected.polya.txt.gz STARTED rnabloom_k15.hist rnabloom.longreads.corrected.long.fa.gz rnabloom.longreads.corrected.repeats.fa.gz rnabloom_k15.hist.log rnabloom.longreads.corrected.long.lengths.txt rnabloom.longreads.corrected.short.fa.gz rnabloom.longreads.assembly1.nr.fa.gz rnabloom.longreads.corrected.long.seed.fa.gz rnabloom.ntcard.readslist.txt
The rnabloom.longreads.assembly1.nr.fa.gz is empty and the .log show me /
[M::mm_idx_gen::142.3652.34] collected minimizers [M::mm_idx_gen::183.8832.68] sorted minimizers [M::main::183.8832.68] loaded/built the index for 2647016 target sequence(s) [M::mm_mapopt_update::191.3482.61] mid_occ = 1793 [M::mm_idx_stat] kmer size: 19; skip: 5; is_hpc: 1; #seq: 2647016 [M::mm_idx_stat::194.689*2.59] distinct minimizers: 292680490 (35.10% are singletons); average occurrences: 6.355; average spacing: 4.301; total length: 8000271109
I still have more than 5 millions reads I for sure should be able to have some overlaps. Do you have any idea what could go wrong ?