AlignerBoost manual

AlignerBoost is a generalized software toolkit for boosting Next-Gen sequencing mapping precision using a Bayesian based mapping quality framework.

AlignerBoost works with any NGS aligners that can produce standard SAM/BAM alignment outputs. Currently supported aligners that AlignerBoost has optimized for mapping precision and sensitivity include: DNA aligners: Bowtie, Bowtie2, BWA-ALN/BWA-SW/BWA-MEM, NovoAlign, SeqAlTo RNA aligners: Tophat, Tophat2, STAR

AlignerBoost works by tuning NGS aligners to report all potential alignments, then utilizes a Bayesian-based framework to accurately estimate the mapping quality of ambiguously mapped reads.

AlignerBoost can dramatically increase mapping precision without a significant loss of sensitivity under various experimental strategies.

AlignerBoost is SNP-aware, and higher quality alignments can be achieved if provided with known SNPs.

Download and installation

You can download the latest executable release from GitHub at: https://github.com/Grice-Lab/AlignerBoost/releases. You can also download or fork and pull the source codes from GitHub at: https://github.com/Grice-Lab/AlignerBoost. AlignerBoost is pure Java based, and you can run it without the need for installation on Unix/Linux, Mac OS X, and Windows by simply type "java -jar AlignerBoost.jar" in the shell/terminal.

Dependencies

AlignerBoost does not dependent on any 3rd party library directly. However, if you are using AlignerBoost's best practice to generate executable shell scripts, you do need to have your NGS aligner of choice available in the PATH to be able to run these scripts. You might also need other programs in PATH for some other AlignerBoost pre-processing functionality. See "examples/README.example" for best practice.

Customized SAM format tags

AlignerBoost uses a set of customized tags in generated SAM/BAM files to store auxiliary alignment information calculated during its filter process. These tags are listed below. Note: X?: global tags, Y? seed region related tags, Z?: entire alignment related tags

Tag Type Description

XA i alignment length, including M,=,X,I,D,S but not H,P,N
XL i insert length, including M,=,X,I,D but not S,H,P,N, determined by Cigar or 1DP
XF i actual insert from (start) relative to reference
XI f alignment identity as 1 - (YX + YG) / XL
XH Z alignment likelihood given this mapping locus and base quality, in string format to preserve double precision
XV i known SNVs (if any) used in calculating XH
XP Z alignment posterior probability in string format to preserve double precision
XT Z genetic type (GTYPE) string generated by 'utils classifySAM'
YL i seed length
YX i No. of seed mismatches
YG i No. of seed indels
ZX i No. of all mismatches
ZG i No. of all indels

Best practice

To fully utilize AlignerBoost to increase your mapping precision and sensitivity, it is recommended to use our Best Practice Pipeline . Just download our Best Practice Example README and Configuration file, edit the config file using your favorite text/spread-sheet editor, and start your analysis!

QC and pre-processing tools

These are recommended QC and pre-processing procedures that are intended to be called indirectly by the shell scripts generated by the "best practice" steps. Try run java -jar AlignerBoost.jar for details.

Core programs

Core programs are fundamental tools used to pick most probable (highest mapQ) alignments using AlignerBoost's Bayesian framework. Try run java -jar AlignerBoost.jar run for details.

Statistic summary programs

Summary tools recommended during the "best practice" procedures that will generate and subsequently update a tab-delimited report file for runs/libraries processed in a given study. Try run java -jar AlignerBoost.jar stats for details.

Utility program summaries

Utility tools for manipulating common genomic data files, such as SAM/BAM, BED, WIG, VCF/gVCF and more.

sam2AbsCover convert a SAM/BAM file to customized tab-delimited coverage file with absolute location coordinates
sam2RelCover convert a SAM/BAM file to customized tab-delimited coverage file with relative position coordinates
sam2BinCover convert a SAM/BAM file to customized tab-delimited coverage file with binned (%) coordinates
sam2RegCount count reads from a SAM/BAM file in given regions from a BED file
sam2CoverSumm get simple read cover summary table from a SAM/BAM file
sam2Wig convert a SAM/BAM file to UCSC Wiggle file fixed format
bed2Wig convert a BED6 file to UCSC Wiggle file fixed format
bed2AbsCover convert a BED6 file to customized tab-delimited coverage file with absolute location coordinates
filterSamById filter a SAM/BAM file with a given ID list
classifySAM fast index-based classify of a SAM/BAM file given genomic annotations from GFF file(s)
classifyVCF fast index-based classify of a VCF/gVCF variation file given genomic annotations from GFF file(s)
classifyBED fast index-based classify of a BED file given genomic annotations from GFF file(s)
filterWigFix filter UCSC Wiggle fixed format file(s) with given regions in BED file
filterWigVar filter UCSC Wiggle variable format file(s) with given regions in BED file
wigFix2RelCover convert UCSC Wiggle Fixed format file(s) to tax-delimited coverage file in given regions
wigVar2RelCover convert UCSC Wiggle Variable format file(s) to tax-delimited coverage file in given regions

Try run java -jar AlignerBoost.jar utils for details.

Grice-Lab / AlignerBoost

readme