barricklab / breseq

breseq is a computational pipeline for finding mutations relative to a reference sequence in short-read DNA resequencing data. It is intended for haploid microbial genomes (<20 Mb). breseq is a command line tool implemented in C++ and R.
http://barricklab.org/breseq
GNU General Public License v2.0
144 stars 21 forks source link

Summarizing data quality/coverage across many runs #219

Open jeffreybarrick opened 5 years ago

jeffreybarrick commented 5 years ago

Motivation: It would be very useful to have a script that can take many runs and create a dashboard for evaluating and comparing their quality/coverage.

It might:

Implementation: Most likely as Python/R scripts that generate HTML output. They can parse the summary.json files for statistics and use breseq BAM2COV to generate files to generate input files for graphing, for example.

jeffreybarrick commented 4 years ago

Here are some example summary files that can be used for testing: https://barricklab.org/release/tmp/ADP1-summary.tgz

jeffreybarrick commented 4 years ago

HTML table as output.

Could eventually color some cells green/yellow/red to flag suspect files/samples.

In general, the output should have most of the same columns, but additional information, compared to the READ and REFERENCE tables generated for one breseq run. Example:

https://barricklab.org/twiki/pub/Lab/ToolsBacterialGenomeResequencing/REL8593A_output/summary.html

Columns to include in the REFERENCE TABLE:

Columns to include in the READ TABLE

jeffreybarrick commented 3 years ago

@ginnymortensen Here is a newer set of breseq output that preserves all of the output folders compared to the one linked above. The output.json files are still the main place to pull information from.

https://barricklab.org/release/tmp/Ara-1-summary.tgz