barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
142 stars 21 forks source link

Output vcf format files? #80

Closed minisciencegirl closed 9 years ago

minisciencegirl commented 9 years ago

Is it possible to get .vcf format files for all the mutations? Or is there an easy way to convert either .gd or .html files to .vcf files? Thanks!

jeffreybarrick commented 9 years ago

Could you give some extra info on what you'd like to do with the VCF file afterward?

It's difficult to represent certain types of mutations in VCF (which is why we felt it necessary to invent the GD format), so breseq + gdtools currently only handle conversion of small indels and base substitutions into this format. VCF wasn't really designed for large deletions or transposons, but they could potentially be added. I'd check whether the downstream tool will use a 30 kb deletion or the like in VCF format or not know what to do with it.

minisciencegirl commented 9 years ago

Hi Jeff, thanks for the quick response!

What I find really useful with breseq is the functional annotation of each of the mutation and the ability to quickly compare a number of isolates using the compare command in gdtools. I was wondering if there is a way for me to feed in bam files that I generated with BWA-mem to breseq and just utilize the final step of annotation by breseq? I realize it's not what breseq is built for, but haven't encountered another tool like what breseq offers.

Looking carefully into the gdtools now, I realized that I can use subtract/intersect/union etc to do what I was doing in bedtools with vcf files.

Thanks again!

jeffreybarrick commented 9 years ago

breseq will take an aligned SAM (sorry not BAM) as input (--aligned-sam) option. It will call RA and MC type mutations (but not JC) from the read alignments in those files and generate normal output. Let me know if this doesn't work, as we don't routinely test this functionality.

Maybe a better option if you have VCF output from another variant caller it to convert that to GD format. There is a gdtools VCF2GD command which was hidden in the current releases. So, you could use that and then gdtools ANNOTATE. I have temporary breseq 0.26.1a versions posted here that has that command made visible. Let me know if I don't have one for your preferred architecture.

http://barricklab.org/release/breseq/

minisciencegirl commented 9 years ago

Thanks Jeff. I will give VCF2GD/ ANNOTATE a try.