bcgsc / RNA-Bloom

:hibiscus: reference-free transcriptome assembly for short and long reads
Other
85 stars 7 forks source link

ERROR: Incorrect FASTA header format #44

Closed schorlton closed 1 year ago

schorlton commented 2 years ago

Hi @kmnip,

Thanks for your support on my other issues. Here's another interesting one. I'm pretty sure the input FASTQ is valid and again this is just too few reads/too short causing some kind of FASTA invalid error. Thanks for your help!

root@06a8b6dc9fba:/data/retry# rnabloom -outdir rnabloom_out -t 8 -long filtered.fastq -ntcard                                                           [2/1879]
RNA-Bloom v1.4.3
args: [-outdir, rnabloom_out, -t, 8, -long, filtered.fastq, -ntcard]

name:   rnabloom
outdir: rnabloom_out
WARNING: Output directory does not exist!
Created output directory at `rnabloom_out`

K-mer counting with ntCard...
Running command: `ntcard -t 8 -k 17 -c 65535 -p rnabloom_out/rnabloom @rnabloom_out/rnabloom.ntcard.readslist.txt`...
Parsing histogram file `rnabloom_out/rnabloom_k17.hist`...
Unique k-mers (k=17):     2,368
Unique k-mers (k=17,c>1): 192
K-mer counting completed in 3.973s

Bloom filters          Memory (GB)
====================================
de Bruijn graph:       5.232985E-6
k-mer counting:        3.3946708E-6
====================================
Total:                 8.627656E-6

> Stage 1: Construct graph from reads (k=17)
Parsing `filtered.fastq`...
Parsed 41 sequences in 0.013s
DBG Bloom filter FPR:                 1.56 %
Counting Bloom filter FPR:            0.81 %
> Stage 1 completed in 0.024s

> Stage 2: Correct long reads for "rnabloom"
Parsing `filtered.fastq`...
Corrected Read Lengths Sampling Distribution (n=26)
        min     q1      med     q3      max
        18      23      63      92      213
Parsed 41 sequences.
        Kept:      26   (63.4 %)
        Discarded: 15   (36.6 %)
Corrected reads in 0.292s
Extracting seed sequences...
Bloom filter FPR:       0.0119 %
before: 1       after: 1 (100.0 %)
Extraction completed in 0.104s
> Stage 2 completed in 0.397s

> Stage 3: Assemble long reads for "rnabloom"
ERROR: Incorrect FASTA header format
rnabloom.io.FileFormatException: Incorrect FASTA header format
        at rnabloom.io.FastaReader.nextWithComment(FastaReader.java:240)
        at rnabloom.RNABloom.splitFastaByLength(RNABloom.java:5269)
        at rnabloom.RNABloom.main(RNABloom.java:7083)
kmnip commented 1 year ago

This bug is fixed. Please see my new release of RNA-Bloom v2.0.0: https://github.com/bcgsc/RNA-Bloom/releases/tag/v2.0.0