barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
152 stars 20 forks source link

Compare across samples to improve predictions of low-frequency variants #221

Open jeffreybarrick opened 5 years ago

jeffreybarrick commented 5 years ago
jeffreybarrick commented 5 years ago

Use example data from:

Deatherage, D. E., Traverse, C. C., Wolf, L. N., Barrick, J. E. (2015) Detecting rare structural variation in evolving microbial populations from new sequence junctions using breseq. Front. Genet. 5: 468.

It consists of 24 mixed population samples described here: https://github.com/barricklab/LTEE-Ecoli/tree/master/LTEE-mixed

For a smaller dataset to analyze, use the fabR gene region (REL606:4140000-4142000)

samtools mpileup --max-depth 1000000 -r REL606:4140000-4142000 -o fabR.pileup.tsv -f 03_Output/Ara+5_500gen_REL772/data/reference.fasta `ls 03_Output/*/data/reference.bam` 

Output available for download here: https://barricklab.org/release/breseq_development/issue221/fabR.pileup.tsv.gz

They are in pileup format: http://samtools.sourceforge.net/pileup.shtml

jeffreybarrick commented 4 years ago

@dgauraang The new pileup that compares just two samples is available for download here:

https://barricklab.org/release/breseq_development/issue221/two-sample-pileup.tsv.gz