ablab / spades

SPAdes Genome Assembler
http://ablab.github.io/spades/
Other
731 stars 134 forks source link

metaspades aborting at stag "Gap Closer" with Assertion `overlap <= k_' failed #16

Closed jvollme closed 7 years ago

jvollme commented 7 years ago

Hello, I am running metaspades on a large soil metagenome, which I have partitioned into "high-", "low" and "medium"-coverage subsets prior to assembly using bbnorm (based on median kmer-counts for each read). I am assembling these fractions individually to reduce RAM-requirements.

For the assembly I am using a k-mer range from 21-99, with extra small k-mer steps towards the end, in order to ensure that i minimize k-mer gaps for my (low-abundant) target genomes (last k-mer steps=91,95,97,99 for ~101 bp HiSeq reads)

Assembly worked fine for the "low"- and "high"-abundant fractions. It also seemed to work fine for most of the k-mer steps of the "medium"-abundant fraction. However during the forelast k-mer step (97), the "medium"-assembly keeps aborting with the following error message:

[...]  
  1:24:10.248     4G / 18G   INFO   K-mer Index Building     (kmer_index_builder.hpp    : 437)   Building perfect hash indices
  1:26:45.297     4G / 18G   INFO    General                 (kmer_index_builder.hpp    : 276)   Merging final buckets.
  1:27:12.929     4G / 18G   INFO   K-mer Index Building     (kmer_index_builder.hpp    : 483)   Index built. Total 52155392 bytes occupied (2.61376 bits per kmer).
  1:27:15.200     8G / 18G   INFO    General                 (edge_index_builders.hpp   :  27)   Collecting k-mer coverage information from graph, this takes a while.
  1:27:56.406     8G / 18G   INFO    General                 (edge_index.hpp            :  91)   Index refilled
  1:27:56.411     8G / 18G   INFO    General                 (gap_closer.cpp            : 159)   Preparing shift maps
  1:28:00.765     8G / 18G   INFO    General                 (gap_closer.cpp            : 119)   Processing paired reads (takes a while)
  1:29:45.203     8G / 18G   INFO    General                 (gap_closer.cpp            : 138)   Used 17384814 paired reads
  1:29:45.204     8G / 18G   INFO    General                 (gap_closer.cpp            : 140)   Merging paired indices
  1:29:49.090     8G / 18G   INFO   GapCloser                (gap_closer.cpp            : 349)   Closing short gaps
=== Stack Trace ===
[0x40d753]
[0x44d309]
[0x45942e]
[0x459ba3]
[0x4617dd]
[0x436358]
[0x546e21]
[0x40a328]
[0x4028ea]
[0x79acf0]
[0x4086fd]
spades: /spades/src/projects/spades/gap_closer.cpp:294: bool debruijn_graph::GapCloser::HandleSimpleCase(debruijn_graph::GapCloser::EdgeId, debruijn_graph::GapCloser::EdgeId, int): Assertion `overlap <= k_' failed.

== Error ==  system call for: "['/work/home/jov14/tools/SPAdes-3.10.0/bin/spades', '/work/home/jov14/tmp/projects/pratscher_USCa_metagenome2/results/assembly/170124_metaspades_bbmapmidpartition_alldatasets_k21-101/assembly_metaspades3.10/K97/configs/config.info', '/work/home/jov14/tmp/projects/pratscher_USCa_metagenome2/results/assembly/170124_metaspades_bbmapmidpartition_alldatasets_k21-101/assembly_metaspades3.10/K97/configs/mda_mode.info', '/work/home/jov14/tmp/projects/pratscher_USCa_metagenome2/results/assembly/170124_metaspades_bbmapmidpartition_alldatasets_k21-101/assembly_metaspades3.10/K97/configs/meta_mode.info']" finished abnormally, err code: -6

I have attached the params.txt and spades.log below.

Do you know what is causing this error? Will it be possible to continue this assembly, without restricting myself to a lower maximum-kmer (I want all of the fractions to be assembled using the same parameters).

params.txt

spades.log.txt

asl commented 7 years ago

Hello

This problem was fixed in SPAdes 3.10.1, please update.

Note, however, that usually you should not go beyond half read length for k-mer size, therefore the use of k=99 for 101 bp reads is strongly discouraged - you'll receive very suboptimal assemblies (specially for the low-abundant ones).

jvollme commented 7 years ago

Thanks, I'll try out the new version.

Note, however, that usually you should not go beyond half read length for k-mer size, therefore the use of k=99 for 101 bp reads is strongly discouraged - you'll receive very suboptimal assemblies (specially for the low-abundant ones).

But I thought that this, exactly, was a specific issue of single k-mer assemblers (such as velvet), and one of the major actual reasons for iterative multi-kmer assemblies? Since I start at low k-mers (21, 31, 41, 51) I should get good "unitigs" or "pre-scaffolds" even for low coverage regions. These are then used as "input" for the next higher k-mer iteration, thereby allowing me to breach k-mer gaps in low coverage regions at low k-mer iterations, but at the same time allowing me to use the most of my sequence length for resolving repetitive regions. Is this not the case for spades? Because for IDBA and Megahit I always see a general improvement of the assemblies with increasing max-kmer, INCLUDING the low-abundant genomes.

snurk commented 7 years ago

Dear @jvollme, In the ideal world, iterative increase of K (up to the read length) indeed leads to the steady improve of coverage, without bringing additional misassemblies. In real world it can lead to problems. You escape some of them by making ultra-small steps. Unfortunately it might not be enough, especially if we talk about current implementation of iterative procedure in SPAdes! In contrast to IDBA-UD and MeGAHIT, SPAdes does not perform any kind of repeat resolution/local reassembly/contig extension on the intermediate steps. The contigs passed to the next step are exactly the sequences of edges of simplified assembly graph, which break on every repeat. Which means that you will be inevitably loosing correct connections (e.g. entrances/exits to/from repeats not directly supported by any long K-mer) as you increase the value of K in SPAdes.

jvollme commented 7 years ago

Ah thanks, that's important to know!