barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
147 stars 21 forks source link

merge info from different samples for downstream processing #238

Open wmoebius opened 4 years ago

wmoebius commented 4 years ago

This is a feature request or, alternatively, a request for advice.

The ability of breseq to compare different samples in an HTML file is super useful. I am wondering how to use that information for downstream analysis, e.g., merging the results from multiple samples into one VCF file. Would that be an option that could be added to gdtools?

jeffreybarrick commented 4 years ago

gdtools ANNOTATE/COMPARE can output a GD file that contains the frequencies of each mutation across multiple samples in accessory fields named: frequency_<sample>. This is used "under the hood" to generate the HTML comparison tables that have different columns for different samples.

Are you asking whether this GD file could then be converted into a VCF file that contains mutation information about many samples? Currently gdtools GD2VCF doesn't support this, but we could add that ability.

wmoebius commented 4 years ago

Yes, exactly this would be useful for downstream analysis.