McTavishLab / extensiphy_fork

Pipeline to place taxa in a sequence alignment and phylogeny using NGS reads
Extensiphy is a pipeline that assembles homologous loci by aligning reads to a reference from a multiple sequence alignment, calls consensus, adds to the existing alignment. Homologous loci may be kept concatenated or split back into individual alignments prior to phylogenetic estimation.


Extensiphy takes an alignment and sets of sequencing reads from query taxa (a). Reads are aligned to a reference sequence and a consensus sequence is called (b). The new sequence is added to the alignment and the updated alignment is used to estimate a phylogeny (c).

Setup and Use

Extensiphy now takes inputs in the commandline without requiring a config file. Extensiphy allows for control over both how many Extensiphy runs happen in parallel and how many threads are allocated to each Extensiphy run Make sure you dont ask your computer to work too hard by adding more runs and threads than your computer can handle find out how many cores you have available and calculate (cores * extensiphy_runs) you wish to run as the same time if you have 8 cores available, consider starting 2 runs with 3 threads available to each, then adjust to your optimum setting.

First Run

Once you've cloned this repo and installed all dependencies to your PATH, begin here. Dependencies are outlined at the bottom of this readme.

If you only plan on using Extensiphy to add data to an existing alignment and tree, use the following command:

./ -a ./testdata/combo.fas -t ./testdata/combo.tre -d ./testdata

If you plan to generate a starting alignment and tree that you wish to add sequences to, test gon_phyling with this command:

./ -d ./gon_phy_testdata


Extensiphy requires that you limit the loci you include for updating to sequences with lengths of 1000bp or above. This is to protect the read mapping and basecall accuracy. This is checked when using individual locus alignments as input but when using a concatenated alignment, the user must make this assessment themselves.

Extensiphy Controls and Flags For Use:

Required flags

Optional flags

Output Files!

Gon_phyling Controls and Flags For Use


If all you have is raw reads and you need to create a starting tree:

Creating a starting tree! You need a tree and alignment with any number of taxa in order to update these with new taxa. is a simple pipeline to de novo assemble reads before using parsnp for loci selection and finally phylogenetic inference.

  1. Move some fraction of your reads to a new directory for assembly and starting tree inference.
  2. run:
  3. Use the produced alignment file, tree file and the rest of the reads as the inputs for a full Extensiphy run by running: -a [PATH/TO/ALIGNMENT/FILE] -d [PATH/TO/READ/DIRECTORY] -t [PATH/TO/TREE/FILE] -1 [READ SUFFIX 1] -2 [READ SUFFIX 2].


For a more indepth walkthrough of how to install dependencies for use with Extensiphy and how to run Extensiphy using different data types and options, try the tutorial in the tutorial folder. You can copy code snippets into your terminal window.


Using Extensiphy is limited to Linux at the moment. Using Ubuntu will ensure the smoothest performance. If you want to use another distro, you'll have to make sure you install analogous one-liners and all that. You have been warned.

Dependencies (Separate programs you'll need to install):

  1. Python 3
  2. bwa-mem2
  4. Seqtk
  5. Samtools
  6. Bcftools
  7. Fastx
  8. Dendropy

Additionally, Extensiphy comes with an additional pipeline for generating a phylogenetic tree from scratch: Gon_phyling. These programs are not required for running Extensiphy itself but Gon_ling can be useful if you have a lot of data and aren't interested in hand selecting the loci/genes you include in your alignment. Gon_phyling's dependencies are as follows:

  2. Spades
  3. BBmap

Apt-get dependency install

Almost all programs for running Extensiphy are available with apt-get. Hisat2 is not available with apt-get. Run the commands found below to install:

apt-get install raxml
apt-get install seqtk
apt-get install samtools
apt-get install bcftools
apt-get install fastx-toolkit
pip install dendropy

Installs with apt-get for Gon_phyling are not currently available. You will have to install these programs manually or with conda.

Conda dependency install

Use conda for fastest dependency install.

Add appropriate channels to your conda install:

conda config --prepend channels conda-forge
conda config --prepend channels bioconda

Run this command to add the necessary dependencies to your conda environment:

conda create -n extensiphy samtools bwa-mem2 seqtk bcftools fastx-toolkit dendropy raxml

Activate your installation

conda activate rapup

Conda install recipe on the way.