ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
759 stars 138 forks source link

Contig file generated by spades.py contains 'N' characters #1169

Open paldougall opened 1 year ago

paldougall commented 1 year ago

Description of bug

In the documentation for SPAdes it is specifically stated that contigs formed by SPAdes will NOT contain 'N' characters, but I have a file generated by SPAdes that does which leads me to believe there is an error somewhere in the code. Node 124 contains a string of N characters.

spades.log

spades.log

params.txt

params.txt

SPAdes version

SPAdes-3.15.4-Linux

Operating System

CentOS Linux 7 (Core)

Python Version

Python 3.6.8

Method of SPAdes installation

binaries

No errors reported in spades.log

asl commented 1 year ago

Sorry, but there is no "Node 124" in the files attached.

paldougall commented 1 year ago

contigs.txt assembly_graph.fastg.txt contigs.paths.txt

Here is the contigs.fasta, contigs.paths, and assembly_graph.fastg (all attached as .txt files) with the NNN it appears that looking at the contigs path the edge in the assembly graph does not have the N's, but somehow they appear in final contig file

asl commented 1 year ago

SPAdes itself does produce contigs w/o N's. However, you are running mismatch corrector as well that might mark ambiguous bases in such way.

I would probably suggest you to use --isolate mode for your dataset and skip the mismatch correction.