ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
752 stars 135 forks source link

Contigs contain non-ascii characters #1101

Open christianbrinch opened 1 year ago

christianbrinch commented 1 year ago

Description of bug

After de novo assembly of paired end Illumina reads, Spades output contigs/scaffolds that contain non-ascii characters (). The input fast files do not contain these characters.

Example

spades.log

spades.log.zip

params.txt

params.txt.zip

SPAdes version

SPAdes/3.15.5

Operating System

Ubuntu/centOS

Python Version

python3.9

Method of SPAdes installation

manual

No errors reported in spades.log

asl commented 1 year ago

This is strange. Does the problem reproduce with restart? And w/o --trusted-contigs option?

christianbrinch commented 1 year ago

It persists with restart but goes away without the --trusted-contigs option.

asl commented 1 year ago

Ok. Will it be possible for you to share the data so we can reproduce and fix issue?

christianbrinch commented 1 year ago

Unfortunately I am not able to share the data and I understand that it makes I difficult to resolve the issue. I can, however, explain what I have done. I have a large set (100+) of metagenomic samples, consisting of miseq, nextseq, and novaseq reads, from which a MAG has been assembled in a large metagenomic co-assembly. The MAG is about 60% complete and I am trying to extract the full genome. My strategy is to align all my samples against the MAG, take the reads out that align, de novo assemble those reads with the MAG as trusted contigs, curate the resulting scaffolds and use them as my new MAG. Then I repeat the cycle. After five such iterations, I have reached about 75% completeness, and it works quite well: the contigs grow with a few hundred bases per iteration as expected. However, at the 6th iteration, Spades all of a sudden creates these contigs with non-ascii characters in them.

Because the error goes away when I drop the --trusted-contig option, it must be caused by the set of contigs I use for that. I curate my contigs using Geneious, so maybe it outputs something Spades doesn't like? I can't find any non-standard characters in those fast files though, but I will investigate the issue a bit further myself.