RagnarGrootKoerkamp / astar-pairwise-aligner

A pairwise sequence aligner written in Rust
Mozilla Public License 2.0
117 stars 11 forks source link

+TITLE: APA & APA2: A* Pairwise Aligner

+PROPERTY: header-args :eval no-export :exports results

APA is a global pairwise sequence aligner for edit distance using A, co-authored by [[https://github.com/pesho-ivanov][@pesho-ivanov]] and [[https://github.com/RagnarGrootKoerkamp][@RagnarGrootKoerkamp]].

APA2 is an improvement of APA that uses a DP-based approach instead of plain A*. It achieves up to 20x speedup over other exact aligners and is competitive with approximate aligners.

An alignment of two sequences of length 500 with 30% error rate using A*PA:

[[file:imgs/readme/layers.gif]]

An alignment of two sequences of length 10'000 with 15% error rate using A*PA2:

[[file:imgs/readme/astarpa2.gif]]

* Rust API To call APA2 from another Rust crate, simply add the =astarpa[2]= crate in this repo as a git dependency.

For A*PA2, use ~astarpa2_simple(a, b)~ or ~astarpa2_full(a, b)~ in the [[file:astarpa2/src/lib.rs][~astarpa2~ crate]], or customize parameters with e.g.

+begin_src rust

let mut params = astarpa2::AstarPa2Params::full(); params.front.incremental_doubling = false; let mut aligner = params.make_aligner(true); let (cost, cigar) = aligner.align(a, b);

+end_src

The ~astarpa~ crate is the [[file:astarpa/src/lib.rs][main entrypoint]] for A*PA. See the docs there. Use ~astarpa::astarpa(a, b)~ for alignment with default settings or ~astarpa::astarpa_gcsh(a,b,r,k,end_pruning)~ to use GCSH+DT with custom parameters.

More complex usage examples can be found in [[file:pa-bin/examples/][pa-bin/examples]].

** C API The ~astarpa-c~ [[file:astarpa-c/astarpa.h][crate]] contains simple C-bindings for the ~astarpa::{astarpa,astarpagcsh}~ and ~astarpa2::astarpa2{simple,full}~ functions and an [[file:astarpa-c/example.c][example]] with [[file:astarpa-c/makefile][makefile]]. More should not be needed for simple usage. To run the resulting binary, make sure to ~export LD_LIBRARY_PATH=/path/to/astarpa/target/release~.

** Command line application =pa-bin= is a small command line application that takes as input consecutive pairs of sequences from a =.fasta=, =.seq=, or =.txt= file (or can generate random input) and outputs costs and alignments to a =.csv=.

+end_src

This requires =cargo= and Rust =nightly=. To get both, first install [[https://rustup.rs/][rustup]]. Then enable ~nightly~: ~rustup install nightly; rustup default nightly~.

Install =pa-bin= to =~/.local/share/cargo/bin/pa-bin= using the following (cloning this repo is not needed):

+begin_src shell

cargo install --git https://github.com/RagnarGrootKoerkamp/astar-pairwise-aligner pa-bin

To run from the repository: clone and ~cargo run --release -- ~.

+begin_src shell :exports both :results verbatim

cargo run --release -- -h

+end_src

+RESULTS:

+begin_example

Globally align pairs of sequences using A*PA

Usage: pa-bin [OPTIONS] <--input |--length >

Options: -i, --input A .seq, .txt, or Fasta file with sequence pairs to align -o, --output Write a .csv of {cost},{cigar} lines --aligner The aligner to use [default: astarpa2-full] [possible values: astarpa, astarpa2-simple, astarpa2-full] -h, --help Print help (see more with '--help')

Generated input: -n, --length Target length of each generated sequence [default: 1000] -e, --error-rate Error rate between sequences [default: 0.05]

+end_example

Here are some sample videos. The first five correspond to figure 1 of the A*PA paper. Timings are not comparable due to differences in visualization strategies (cell vs layer updates).

|----------------------------------------------------------------------+----------------------------------------------------------------------------| | Dijkstra [[file:imgs/readme/2_dijkstra.gif]] | Ukkonen's exponential search (Edlib) [[file:imgs/readme/1_ukkonen.gif]] | | Diagonal transition (WFA) [[file:imgs/readme/3_diagonal_transition.gif]] | DT + Divide & Conquer (BiWFA) [[file:imgs/readme/4_dt-divide-and-conquer.gif]] | | APA (GCSH+DT) [[file:imgs/readme/5_astarpa.gif]] | APA2-full (8-bit words; block size 32) [[file:imgs/readme/6_astarpa2.gif]] |

Code is spread out over multiple crates. From low to high:

+begin_src shell :results file :file imgs/readme/depgraph.svg :exports results

cargo depgraph --dedup-transitive-deps \ --include pa-generate,pa-bin,pa-vis,astarpa,pa-types,pa-affine-types,sdl2,pa-base-algos,pa-heuristic,pa-vis-types,astarpa-c,pa-bitpacking,astarpa2,astarpa-next \ | dot -T svg

+end_src

+RESULTS:

[[file:imgs/readme/depgraph.svg]]