barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
138 stars 21 forks source link

Unable to predict genomic insertion if it overlaps a deletion #319

Open vr1087 opened 1 year ago

vr1087 commented 1 year ago

Breseq can predict an insertion of a sequence if it's annotated as a repeat_region.

I have a sample where a gene was deleted and then subsequently another sequence was inserted at the deletion site. I can get breseq to predict the insertion only if the reference has the mentioned deletion and not if it still has the deleted sequence. In the case where it fails, the evidence for the insertion is listed in the unsigned junction table.

Is breseq designed to handle this kind of situation? If not, is there a workaround?

jeffreybarrick commented 1 year ago

breseq can't make these predictions. Generally, it's only going to be able to handle predicting mutations that are one step removed from the reference genome (so one deletion or one transposon insertion).

I see what you mean that if the evidence is there (MC + 2 x JC that support replacing that region) that this could potentially be recognized and automated. There's another case of IS-mediated deletions that is similar in that it creates a MC and two JC at the ends of the deletion to the same IS element but is not resolved to mutations. These events likely result from one new IS element insertion at the first end, a second at the other end, and then recombination deleting the region between the copies as it collapses to one IS element. This is something we've talked about automating, so maybe we can consider this case too, if we ever get to that...

For now, our solution is to write out the mutational steps in a Genome Diff file to accomplish both changes, which means we can count them correctly as two events, and generate the mutant genome.