Closed sebschmi closed 2 years ago
Hello @sebschmi,
k < l
is acceptable for rust-mdbg
which means it's probably not the issue here. Have you removed the whitespaces and non-ACGTN
nucleotides like below?
seqtk seq -AU reads.fq > reads.clean.fa
Thanks!
Hi,
interestingly, the failure occurs only with k < l
, for larger k
#21 occurs (but then again, #21 might occur before this, just hiding the problem).
I used grep [^ACGTN] reads.fa | grep "^[^>]" > reads.illegal_grep
to check for illegal characters, but the output file stays empty, meaning that all lines either start with >
, or contain only ACGTN
.
I will execute the exact commands that were propsed in #21, to further nail down the problem.
~Apparently using single-line fasta fixes this problem. Thanks for the help!~
Well, apparently I celebrated to early.
When using a single-line fasta file (produced by seqtk seq -AU reads.fq > reads.clean.fa
), it first works for k = 10, but when the script reaches k = 15, the same error as before appears.
Again both with and without homopolymer compression.
When assembling E.coli with the multik script, it runs mdbg with k = 10 and l = 12, resulting in mdbg panicking with "Non-ACGTN nucleotide encountered!"
The multik script then continues silently.
output
``` thread '