POASTA is a fast and optimal partial order aligner that supports gap-affine alignment penalties. Inspired by a recent algorithm for pairwise alignment, it can exploit exact matches between the query and the graph, greatly speeding up the alignment process.
TODO
TODO
POASTA is written in Rust, and to build and install it, you'll need a recent version of the Rust compiler. The minimum supported Rust version is 1.70.
rustup
: https://rustup.rs/rustup update
Clone the repository.
git clone https://github.com/broadinstitute/poasta
Move into the directory.
cd poasta
Build using cargo
. We enable a flag to ensure the compiler uses all features of your machine's CPU.
To maximize portability of the binary, however, remove the RUSTFLAGS="..."
part.
RUSTFLAGS="-C target-cpu=native" cargo build --release
The built poasta
executable will be available in the directory target/release/
To create a multiple sequence alignment from scratch, simply give it a FASTA. The FASTA file can be compressed
with gzip (filename should have a .gz
extension).
poasta align -o graph.poasta sequences.fasta
This will output the graph to a binary file called graph.poasta
. POASTA can reuse this file to later align
additional sequences to it.
To align additional sequences to an earlier created partial order graph, specify the existing graph using the
-g
option.
poasta align -g graph.poasta -o graph_updated.poasta new_sequences.fasta
This will import the graph stored in graph.poasta
, then align the additional sequences in new_sequences.fasta
to
this graph, and outputs the updated graph to graph_updated.poasta
.
POASTA can import an existing multiple sequence alignment stored in columnar FASTA format (e.g., those
created by other tools like mafft
or spoa
), create the equivalent partial order graph from the existing alignment,
and then align new sequences to it. To achieve this, specify the FASTA MSA with extension .fa, .fna, or .fasta with
the -g
option (file is also allowed to be compressed with gzip if it has a .gz
suffix).
poasta align -g msa.fasta -o graph_updated.poasta new_sequences.fasta
The default output format is POASTA's binary file format storing the graph data structure.
This the recommended output because you can always convert this binary file
to other formats using poasta view
.
If you don't need the binary file, however,
you can specify the output format with the -O
option:
Other supported formats:
-O dot
-O fasta
-O gfa
For example, to visualize the graph directly with GraphViz:
poasta align -Odot sequences.fasta | dot -Tpng -o graph.png
Note that we did not specify an output file for poasta align
(we did not use the -o
option). If no output filename
is given, standard output will be used, so the output can be directly piped to dot
to create the visualization.
poasta view
to convert between output formatsBy default, POASTA stores the computed graph/MSA in its own binary file format.
To convert a previously computed MSA to other file formats, you can use poasta view
.
The supported output formats are the same as described above, i.e.:
-O dot
-O fasta
-O gfa
Example:
# Convert to GFA
poasta view -Ogfa existing_msa.poasta > poa_graph.gfa
# Convert to FASTA MSA
poasta view -Ofasta existing_msa.poasta > poa_msa.fasta