barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
149 stars 21 forks source link

Reads aligning at ends of reference sequences don't pass guards #27

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
If a read aligns partially to the end of a fragment, it won't pass the guard 
that requires 90% of its length to be mapped for it to be counted. These reads 
should get to count the part that extends past the end of the fragment as being 
"mapped" for purposes of this test!

This problem can be seen in many tests that use a linear reference sequence, 
but it is especially bad when mapping to contigs from a de novo assembly.

Original issue reported on code.google.com by jeffrey....@gmail.com on 13 Nov 2011 at 11:13

wrshoemaker commented 6 years ago

Hi @jeffreybarrick,

Is there any update regarding this issue? I've been using breseq for a project where all the references are fragmented de novo assembled genomes and I think some odd results I'm seeing are due to this issue. A lot of called mutations look like the attached image, where the variants are only on reads with other variants. When I try looking for the read in the reference sequence, I often get an exact match to either the end or beginning of a different copy of a gene than the copy that breseq called the mutation.

I can account for this and remove these sites in post-processing, but I wanted to see if there's a way to account for this during the alignment step. Unfortunately we're working with a lot of taxa and there are too many gaps to close them all through Sanger sequencing.

Best, Will

screen shot 2018-04-25 at 8 47 36 pm