metaSPAdes, ancient genomes, and merged reads

Hello,

I would like to run metaSPAdes on an ancient metagenome data set for which the reads were simultaneously quality trimmed, filtered, and merged. We don't include the un-merged read pairs in our analyses (read mapping, taxonomic classification) because the longer reads are likely from modern contamination.

I ran a successful test using SPAdes with small dataset where I was able to use just a single unmerged pair1/pair2 read set with the merged reads (below CS23.p1.fastq and CS23.p2.fastq had 1 read, and CS23.mg.fastq had ~1000) so I would like to ask your advice for how best, if at all, to run metaSPAdes with this kind of dataset.

spades.py -1 CS23.p1.fastq -2 CS23.p2.fastq --merged CS23.mg.fastq -k 19,21,27,33,39,45,51,55 -o SPAdes/CS23-assembly --careful -t 6

Will the assembly be problematic if I use a single un-merged pair1/pair2 read set, particularly since we otherwise discard these as contamination? If I alternatively generate pseudo pair1/pair2 read set from a merged read or a repetative sequence with low quality scores, would that be valid input? Is there perhaps a different approach you would suggest?

Thanks, Irina

ablab / spades

metaSPAdes, ancient genomes, and merged reads #127