ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
752 stars 136 forks source link

metaSPAdes, ancient genomes, and merged reads #127

Closed ivelsko closed 6 years ago

ivelsko commented 6 years ago

Hello,

I would like to run metaSPAdes on an ancient metagenome data set for which the reads were simultaneously quality trimmed, filtered, and merged. We don't include the un-merged read pairs in our analyses (read mapping, taxonomic classification) because the longer reads are likely from modern contamination.

I ran a successful test using SPAdes with small dataset where I was able to use just a single unmerged pair1/pair2 read set with the merged reads (below CS23.p1.fastq and CS23.p2.fastq had 1 read, and CS23.mg.fastq had ~1000) so I would like to ask your advice for how best, if at all, to run metaSPAdes with this kind of dataset.

spades.py -1 CS23.p1.fastq -2 CS23.p2.fastq --merged CS23.mg.fastq -k 19,21,27,33,39,45,51,55 -o SPAdes/CS23-assembly --careful -t 6

Will the assembly be problematic if I use a single un-merged pair1/pair2 read set, particularly since we otherwise discard these as contamination? If I alternatively generate pseudo pair1/pair2 read set from a merged read or a repetative sequence with low quality scores, would that be valid input? Is there perhaps a different approach you would suggest?

Thanks, Irina

asl commented 6 years ago

Will the assembly be problematic if I use a single un-merged pair1/pair2 read set, particularly since we otherwise discard these as contamination?

Yes. The whole point of special support of merged reads is to "unmerge" them on fly where necessary to reconstruct original insert size distribution (that was skewed a lot due to merge process).

Is there perhaps a different approach you would suggest?

Just provide either original or the whole merged read sets.