Estimating the optimal number of migration edges from Treemix
This package uses results from the population software 'Treemix' by Pickrell and Pritchard (2012) DOI:10.1371/journal.pgen.1002967 to estimate the optimal number of migrations edges to add to the tree. Furthermore, it has also been updated to work with the output of OrientAGraph (see Molloy et al. 2021), a more advanced admixture graph representation software built on top of the 'Treemix' engine. In Treemix, it was customary to stop adding migration edges when 99.8\% of variation in the data was explained, but optM automates this process using an ad hoc statistic based on the second order rate of change in the log likelihood. OptM has added functionality for various threshold modeling to compare with the ad hoc statistic. The various methods are:
install.packages("SiZer")
install.packages("OptM")
library(OptM)
To run OptM, you will need a folder of output files produced by Treemix v1.13 or OrientAGraph. The function optM will automatically search the folder for the stem.llik, stem.modelcov.gz, and stem.cov.gz files; where "stem" is that provided to the -o parameter of treemix. It is recommended, but not required, to use stem in the format stem.i.M; where
In order for optM to function properly, you must run:
NOTE: There will be an error check to see if there is variation across iterations for each M. In other words, if the data are very robust, you may get the same likelihood across all runs, thus the standard deviation across runs is zero and the ad hoc statistic is undefined. In this case, try making larger variations in the dataset (subsetting the SNPs, varying -k in treemix, or other method of permutation/bootstrap).
for m in {1..10}
do
for i in {1..5}
do
treemix \
-i test.treemix.gz \
-o test.${i}.${m} \
-global \
-m ${m} \
-k 1000
done
done
First load the provided example data for a simulated dataset with 3 migration edges; and 10 iterations for M={1-10}
folder <- system.file("extdata", package = "OptM")
Next, run optM using the default "Evanno"-like method:
test.optM = optM(folder)
Finally, plot the results:
plot_optM(test.optM, method = "Evanno")
Alternatively, run using various linear modeling estimates rather than the ad hoc statistic:
folder <- system.file("extdata", package = "OptM")
test.linear = optM(folder, method = "linear")
plot_optM(test.linear, method = "linear")
OR using SiZer:
folder <- system.file("extdata", package = "OptM")
test.sizer = optM(folder, method = "SiZer")
plot_optM(test.sizer, method = "SiZer")
citEntry
to bibentry
.boots
and splines
in the DESCRIPTION since these packages were not explicitly referenced.if(class(input) != "SiZer") stop("Input object is not of class SiZer.\n")
to if(!"SiZer" %in% class(input)) stop("Input object is not of class SiZer.\n")
in case an object inherited multiple classes.tidy
on my Mac OSX using brew install tidy-html5
to the tidy v5.8.0 library. This fixed some HTML warnings that were specific to Macs.read.table
to read.delim
, and added the option strip.white = TRUE
to read.delim
. This should fix some errors when reading files from orientagraph, since for some reason OrientaGraph added an empty space to lines in the .llik files when m=0, but not to other lines. This generated an error: "Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 8 elements"plot_optM
, changed the plotting color to have an alpha (semi-transparent) fill and Y-axis labels for ΔmFitak, R. R. (2021) OptM: estimating the optimal number of migration edges on population trees using Treemix. Biology Methods and Protocols. 6(1):bpab017.
citation("OptM")
into your R consoleRobert Fitak
Department of Biology
University of Central Florida
USA
rfitak9@gmail.com