Grice-Lab / AlignerBoost

AlignerBoost is a generalized software toolkit for boosting Next-Gen sequencing mapping precision using a Bayesian based mapping quality framework
11 stars 2 forks source link

AlignerBoost manual

AlignerBoost is a generalized software toolkit for boosting Next-Gen sequencing mapping precision using a Bayesian based mapping quality framework.

AlignerBoost works with any NGS aligners that can produce standard SAM/BAM alignment outputs. Currently supported aligners that AlignerBoost has optimized for mapping precision and sensitivity include: DNA aligners: Bowtie, Bowtie2, BWA-ALN/BWA-SW/BWA-MEM, NovoAlign, SeqAlTo RNA aligners: Tophat, Tophat2, STAR

AlignerBoost works by tuning NGS aligners to report all potential alignments, then utilizes a Bayesian-based framework to accurately estimate the mapping quality of ambiguously mapped reads.

AlignerBoost can dramatically increase mapping precision without a significant loss of sensitivity under various experimental strategies.

AlignerBoost is SNP-aware, and higher quality alignments can be achieved if provided with known SNPs.

Download and installation

You can download the latest executable release from GitHub at: https://github.com/Grice-Lab/AlignerBoost/releases. You can also download or fork and pull the source codes from GitHub at: https://github.com/Grice-Lab/AlignerBoost. AlignerBoost is pure Java based, and you can run it without the need for installation on Unix/Linux, Mac OS X, and Windows by simply type "java -jar AlignerBoost.jar" in the shell/terminal.

Dependencies

AlignerBoost does not dependent on any 3rd party library directly. However, if you are using AlignerBoost's best practice to generate executable shell scripts, you do need to have your NGS aligner of choice available in the PATH to be able to run these scripts. You might also need other programs in PATH for some other AlignerBoost pre-processing functionality. See "examples/README.example" for best practice.

Customized SAM format tags

AlignerBoost uses a set of customized tags in generated SAM/BAM files to store auxiliary alignment information calculated during its filter process. These tags are listed below. Note: X?: global tags, Y? seed region related tags, Z?: entire alignment related tags

Tag Type Description

Best practice

To fully utilize AlignerBoost to increase your mapping precision and sensitivity, it is recommended to use our Best Practice Pipeline . Just download our Best Practice Example README and Configuration file, edit the config file using your favorite text/spread-sheet editor, and start your analysis!

QC and pre-processing tools

These are recommended QC and pre-processing procedures that are intended to be called indirectly by the shell scripts generated by the "best practice" steps. Try run java -jar AlignerBoost.jar for details.

Core programs

Core programs are fundamental tools used to pick most probable (highest mapQ) alignments using AlignerBoost's Bayesian framework. Try run java -jar AlignerBoost.jar run for details.

Statistic summary programs

Summary tools recommended during the "best practice" procedures that will generate and subsequently update a tab-delimited report file for runs/libraries processed in a given study. Try run java -jar AlignerBoost.jar stats for details.

Utility program summaries

Utility tools for manipulating common genomic data files, such as SAM/BAM, BED, WIG, VCF/gVCF and more.

Try run java -jar AlignerBoost.jar utils for details.