This repository contains Python code for automatically fitting admixture graphs (with qpGraph), using a heuristic algorithm to iteratively fit increasingly complex models, and R code for calculating Bayes factors (with admixturegraph) to compare the fitted models.
The heuristic search algorithm was first described in the paper The evolutionary history of dogs in the Americas. The code was subsequently refactored to form a stand alone tool, include Bayes factor calculations, for the paper Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion.
Given an outgroup with which to root the graph, a stepwise addition order algorithm is used to add leaf nodes to the graph. At each step, insertion of a new node is tested at all branches of the graph, except the outgroup branch. Where a node can not be inserted without producing f4 outliers (i.e. |Z| >=3) then all possible admixture combinations are also attempted. If a node cannot not be inserted via either approach, that sub-graph is discarded. If the node is successfully inserted, the remaining nodes are recursively inserted into that graph. All possible starting node orders are attempted to ensure maximal coverage of the graph space.
The resulting list of fitted graphs are then passed to the MCMC algorithm implemented in the admixturegraph R package, to compute the marginal likelihood of the models and their Bayes Factors (BF).
If you reuse any of this code then please cite the papers:
Leathlobhair, M.N.*, Perri, A.R.*, Irving-Pease, E.K.*, Witt, K.E.*, Linderholm, A.*, [...], Murchison, E.P., Larson, G., Frantz, L.A.F., 2018. The evolutionary history of dogs in the Americas. Science 361, 81–85. https://doi.org/10.1126/science.aao4776
Liu, L., Bosse, M., Megens, H.-J., Frantz, L.A.F., Lee, Y.-L., Irving-Pease, E.K., Narayan, G., Groenen, M.A.M., Madsen, O., 2019. Genomic analysis on pygmy hog reveals extensive interbreeding during wild boar expansion. Nature Communications 10, 1992. https://doi.org/10.1038/s41467-019-10017-2
To use this software you will need to install various dependencies.
The easiest way to install qpBrute and all the dependencies is via the conda package manager.
To install miniconda3
for MacOSX:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-MacOSX-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh
or for Linux:
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh
If you already have conda installed, you may need to update it to the latest version before installing qpBrute:
conda update -n base -c defaults conda
Then install qpBrute and all it's dependencies in one step:
conda env create --name qpbrute --file https://raw.githubusercontent.com/ekirving/qpbrute/master/environment.yaml
And lastly, activate the new environment:
conda activate qpbrute
Alternatively, you can install all the dependencies manually via pip and CRAN.
Python ≥ 3.6 and pip:
pip install git+https://github.com/ekirving/qpbrute.git
The full list of Python modules installed in the project environment can be found in the requirements.txt file.
R ≥ 3.4 with the following modules:
install.packages(c("admixturegraph", "coda", "data.table", "ggplot2", "gtools", "raster", "reshape2", "scales", "stringr", "viridis"))
install_github("sbfnk/fitR")
Note: the build location of the binary files for AdmixTools need to be added to your path.
echo 'export PATH="/path/to/AdmixTools/bin:$PATH"' >> ~/.bash_profile
Then reload your bash profile:
source ~/.bash_profile
Note: The size of the graph space grows super exponentially with each additional population, so the maximum number of
population supported by qpBrute in a full search is 7. However, you can use the --no-admix
and --qpgraph
parameters
to reduce the size of the search space and add many more populations in an iterative fashion.
The pipeline is broken into two steps:
qpBrute \
--par test/sim1.par \
--prefix sim1 \
--pops A B C X \
--out Out
Sometimes you already have a base model which you just want to add extra populations to (i.e. use --pops
to specify the new populations).
qpBrute \
--par test/sim1.par \
--prefix sim1 \
--pops Y Z \
--out Out \
--qpgraph path/to/model
You can also use the --no-admix
flag to create a skeleton tree containing populations you know are not admixed, and
use this model as input with the --qpgraph
parameter. This allows you to create large models with many more
populations than can be fully explored via a brute force approach.
qpBayes \
--geno test/sim1.geno \
--ind test/sim1.ind \
--snp test/sim1.snp \
--prefix sim1 \
--pops A B C X \
--out Out
Evan K. Irving-Pease, PalaeoBARN, University of Oxford
This project is licensed under the MIT License - see the LICENSE.md file for details