Closed jorvis closed 9 years ago
We prefer this behavior for various reasons. Bowtie2 will stop in this case usually with a SIGABRT triggered by a bad alloc. Like you already know masking the repeats before searching those sequences is the way to go.
Can you recommend the minimum homopolymer repeat length which should be masked based on the algorithm?
On Tue, Feb 17, 2015 at 1:59 PM, val notifications@github.com wrote:
We prefer this behavior for various reasons. Bowtie2 will stop in this case usually with a SIGABRT triggered by a bad alloc. Like you already know masking the repeats before searching those sequences is the way to go.
— Reply to this email directly or view it on GitHub https://github.com/BenLangmead/bowtie2/issues/18#issuecomment-74740699.
My first thought would be something like 10bp. This will avoid having bowtie wandering around too much with a seed length of 20bp.
Can you provide exact parameters used for the runs that fail?
I will close this issue for now.
Val
I'm aligning (in batches) around 20TB of read data against several thousand microbial genomes. Some of these batches fail with core dumps after a very long runtime (around 10x as long as those that are successful.) I've tried looking into why only certain batches fail, and what I've found is that the genomes it fails on are those which contain long (likely incorrect) homopolymeric repeats. One example is:
Examples of the homopolymeric stretches:
WARNING: Sequence ID gi|257136525|ref|NZ_GG699286.1| contains a homopolymer run (T) of length 45972 WARNING: Sequence ID gi|257136529|ref|NZ_GG699290.1| contains a homopolymer run (A) of length 131072 WARNING: Sequence ID gi|257136550|ref|NZ_GG699311.1| contains a homopolymer run (T) of length 51385 WARNING: Sequence ID gi|257136550|ref|NZ_GG699311.1| contains a homopolymer run (A) of length 262144 WARNING: Sequence ID gi|257136567|ref|NZ_GG699328.1| contains a homopolymer run (A) of length 61064
Obviously these are incorrect sequences, but many entries like this still appear in the public entries and cause bowtie2 to fail. When I replace them with Ns, bowtie2 runs to completion. Is this a known issue with bowtie2?
(I'm using bowtie2-2.2.4)