bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

error in assembling ONT DRS long reads #72

Open nmrt-sahu opened 1 month ago

nmrt-sahu commented 1 month ago

I have used direct RNA-seq ONT long reads data. while running the rnabloom2, I got an error in the assembly step. I am not getting the reason behind it. could you help me out? I have pasted the command line output below for your reference.

rnabloom -long out.fastq -stranded -t 50 --uracil -outdir assambled RNA-Bloom v2.0.1 args: [-long, out.fastq, -stranded, -t, THREADS, --uracil, -outdir, assambled]

ERROR: For input string: "THREADS" java.lang.NumberFormatException: For input string: "THREADS" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:67) at java.base/java.lang.Integer.parseInt(Integer.java:588) at java.base/java.lang.Integer.parseInt(Integer.java:685) at rnabloom.RNABloom.main(RNABloom.java:6433) (rnabloom) [sklab202@compute-0-2 buffalo]$ rnabloom -long out.fastq -stranded -t 50 --uracil -outdir assambled RNA-Bloom v2.0.1 args: [-long, out.fastq, -stranded, -t, 50, --uracil, -outdir, assambled]

name: rnabloom outdir: assambled WARNING: Output directory does not exist! Created output directory at assambled

Turning on option -ntcard to count k-mers

K-mer counting with ntCard... Running command: ntcard -t 50 -k 25 -c 65535 -p assambled/rnabloom @assambled/rnabloom.ntcard.readslist.txt... Parsing histogram file assambled/rnabloom_k25.hist... Unique k-mers (k=25): 2,415,168,253 Unique k-mers (k=25,c>1): 381,587,332 K-mer counting completed in 1m 11s

Bloom filters Memory (GB)

de Bruijn graph: 5.337153 k-mer counting: 6.745998

Total: 12.083151

Stage 1: Construct graph from reads (k=25) Parsing out.fastq... Parsed 5,572,535 sequences in 25m 13s DBG Bloom filter FPR: 0.968 % Counting Bloom filter FPR: 1.02 % Stage 1 completed in 25m 30s

Stage 2: Correct long reads for "rnabloom" Parsing out.fastq... Corrected Read Lengths Sampling Distribution (n=10000) min q1 med q3 max 56 449 624 1019 6622 Parsed 5,572,613 sequences. Kept: 5,571,837 (100.0 %) Discarded: 776 (0.0139 %) Artifacts: 4 (7.1779614E-5%) Corrected reads in 7m 8s Extracting seed sequences... strobemers: n=3, k=11, wMin=12, wMax=61, depth=3 Bloom filter FPR: 3.77 % before: 5,532,596 after: 1,650,543 (29.8 %) too short: 0 Extraction completed in 38m 6s Stage 2 completed in 45m 15s

Stage 3: Assemble long reads for "rnabloom" Overlapping sequences... Parsed 0 overlap records in 0.001s total reads: 1,650,543

  • unique: 0 (0.0 %)
  • multi-seg: 0 Unique reads extracted in 11.945s ERROR: Error extracting unique reads! ERROR: Error assembling long reads!
kmnip commented 1 month ago

For the first error, you forgot to specify an integer for number of threads. So, -t THREADS got errored out. I believed that you sorted out.

For the 2nd error, can you please check the log file name that looks something like this rnabloom.longreads.assembly.nr.fa.log? I think it had something to do with minimap2.