Closed jvollme closed 7 years ago
Hello
This problem was fixed in SPAdes 3.10.1, please update.
Note, however, that usually you should not go beyond half read length for k-mer size, therefore the use of k=99 for 101 bp reads is strongly discouraged - you'll receive very suboptimal assemblies (specially for the low-abundant ones).
Thanks, I'll try out the new version.
Note, however, that usually you should not go beyond half read length for k-mer size, therefore the use of k=99 for 101 bp reads is strongly discouraged - you'll receive very suboptimal assemblies (specially for the low-abundant ones).
But I thought that this, exactly, was a specific issue of single k-mer assemblers (such as velvet), and one of the major actual reasons for iterative multi-kmer assemblies? Since I start at low k-mers (21, 31, 41, 51) I should get good "unitigs" or "pre-scaffolds" even for low coverage regions. These are then used as "input" for the next higher k-mer iteration, thereby allowing me to breach k-mer gaps in low coverage regions at low k-mer iterations, but at the same time allowing me to use the most of my sequence length for resolving repetitive regions. Is this not the case for spades? Because for IDBA and Megahit I always see a general improvement of the assemblies with increasing max-kmer, INCLUDING the low-abundant genomes.
Dear @jvollme, In the ideal world, iterative increase of K (up to the read length) indeed leads to the steady improve of coverage, without bringing additional misassemblies. In real world it can lead to problems. You escape some of them by making ultra-small steps. Unfortunately it might not be enough, especially if we talk about current implementation of iterative procedure in SPAdes! In contrast to IDBA-UD and MeGAHIT, SPAdes does not perform any kind of repeat resolution/local reassembly/contig extension on the intermediate steps. The contigs passed to the next step are exactly the sequences of edges of simplified assembly graph, which break on every repeat. Which means that you will be inevitably loosing correct connections (e.g. entrances/exits to/from repeats not directly supported by any long K-mer) as you increase the value of K in SPAdes.
Ah thanks, that's important to know!
Hello, I am running metaspades on a large soil metagenome, which I have partitioned into "high-", "low" and "medium"-coverage subsets prior to assembly using bbnorm (based on median kmer-counts for each read). I am assembling these fractions individually to reduce RAM-requirements.
For the assembly I am using a k-mer range from 21-99, with extra small k-mer steps towards the end, in order to ensure that i minimize k-mer gaps for my (low-abundant) target genomes (last k-mer steps=91,95,97,99 for ~101 bp HiSeq reads)
Assembly worked fine for the "low"- and "high"-abundant fractions. It also seemed to work fine for most of the k-mer steps of the "medium"-abundant fraction. However during the forelast k-mer step (97), the "medium"-assembly keeps aborting with the following error message:
I have attached the
params.txt
andspades.log
below.Do you know what is causing this error? Will it be possible to continue this assembly, without restricting myself to a lower maximum-kmer (I want all of the fractions to be assembled using the same parameters).
params.txt
spades.log.txt