Closed ivelsko closed 6 years ago
Will the assembly be problematic if I use a single un-merged pair1/pair2 read set, particularly since we otherwise discard these as contamination?
Yes. The whole point of special support of merged reads is to "unmerge" them on fly where necessary to reconstruct original insert size distribution (that was skewed a lot due to merge process).
Is there perhaps a different approach you would suggest?
Just provide either original or the whole merged read sets.
Hello,
I would like to run metaSPAdes on an ancient metagenome data set for which the reads were simultaneously quality trimmed, filtered, and merged. We don't include the un-merged read pairs in our analyses (read mapping, taxonomic classification) because the longer reads are likely from modern contamination.
I ran a successful test using SPAdes with small dataset where I was able to use just a single unmerged pair1/pair2 read set with the merged reads (below CS23.p1.fastq and CS23.p2.fastq had 1 read, and CS23.mg.fastq had ~1000) so I would like to ask your advice for how best, if at all, to run metaSPAdes with this kind of dataset.
spades.py -1 CS23.p1.fastq -2 CS23.p2.fastq --merged CS23.mg.fastq -k 19,21,27,33,39,45,51,55 -o SPAdes/CS23-assembly --careful -t 6
Will the assembly be problematic if I use a single un-merged pair1/pair2 read set, particularly since we otherwise discard these as contamination? If I alternatively generate pseudo pair1/pair2 read set from a merged read or a repetative sequence with low quality scores, would that be valid input? Is there perhaps a different approach you would suggest?
Thanks, Irina