Closed Angel-Popa closed 6 years ago
The microsatellite datasets I've tried eems on typically consists of a few dozens of microsatellites. The African elephant dataset for example has 16 microsatellites. Your dataset has 57326 loci. Can you describe in more detail how these loci are derived? (In short I am wondering whether runeems_sats
is appropriate for your data and whether there might be some numerical under/overflow issue.)
Sorry for the mistake, I should have erased this issue as it is not a software issue. I made the mistake of using runeems_sats
instead of runeems_snps
. My dataset is generated using a genotype by sequencing method and although the dataset can be transformed into a microarray data with presence/absence format it does not reflects microsatellites (here is more information of the method for sequencing.
I did use runeems_snps
and it worked it. :) However, I have a question, I calculated a dissimilarity matrix, using the R
function bitwise.dist()
from the package poppr
but I am not very confident that this matrix is conditionally negative definite.
qfly-nH-nIndiv143-nSites3774.zip
Given that the results I have obtained are consistent with Discriminant Analysis of Principal Components. How would it affect the results if the matrix is not negative definite?
The runeems_snps
implementation checks whether the input matrix is a valid distance matrix and it would print an error message if that's not the case. Here is a simple function in R does makes exactly the same check:
input <- read.table("qfly-nH-nIndiv143-nSites3774.diffs")
# Function to check if a matrix D is a valid Euclidean distance matrix
# by checking whether the matrix is conditionally negative definite:
# D must be symmetric, nonnegative, with diagonal of 0s,
# with 1 positive eigenvalue and n-1 negative eigenvalues
is_distmat <- function(D) {
D <- as.matrix(D)
a <- min(D >= 0) # Check that D is nonnegative
b <- min(diag(D) >= 0 & diag(D) <= 0 ) # Check that D has a diagonal of 0s
c <- isSymmetric(unname(D)) # Check that D is symmetric
n <- nrow(D)
val <- eigen(D)$values # Compute the eigenvalues
tol <- 1e-12
negative <- sum(val < -tol) # How many eigenvalues are positive?
positive <- sum(val > +tol) # How many eigenvalues are positive?
(a & b & c) & (positive == 1) & (negative == n-1)
}
is_distmat(input)
#> TRUE
The documentation specifies that the function poppr::bitwise.dist
computes the Hamming distance. This is the number (or the average) of element-wise differences, with care taken as to how missing data is treated.
Here is a very simple example to illustrate how the Hamming distance and the Euclidean distance differ intuitively. [Euclidean distance is what the small program bed2diffs
, also available in this repo, computes.]
hamming.distance <- function(x, y) {
stopifnot(is.numeric(x),
is.numeric(y),
length(x) == length(y))
mean(x != y)
}
euclidean.distance <- function(x, y) {
stopifnot(is.numeric(x),
is.numeric(y),
length(x) == length(y))
mean((x - y)^2)
}
x <- c(0, 0, 0)
y <- c(0, 0, 1)
z <- c(0, 0, 2)
hamming.distance(x, y)
#> 0.3333
hamming.distance(x, z)
#> 0.3333
euclidean.distance(x, y)
#> 0.3333
euclidean.distance(x, z)
#> 1.333
This shows that Euclidean distance puts more emphasis on large differences in some coordinates than the Hamming distance.
Dear author,
I was trying to use runeems_sats and I have run into some issues. I am working with DArTseq data, using the silicoDArT which in essence is a dataset of microsatellites. In this specific case, I have 57326 loci for 158 samples. I exported all the files I need it to work with EEMS, and after loading them I get the following error (see below).
Thanks in advance,
[Habitat::initialize] Done.
[Graph::initialize] Generate population grid and sample assignment Loaded sample coordinates from ../data/qfly-nIndiv158-nSites57326.coord There are 20 observed demes (out of 170 demes) The population grid has 170 demes and 446 edges There are 158 samples assigned to 20 observed demes [Graph::initialize] Done.
[Diffs::initialize] Read genotype data from ../data/qfly-nIndiv158-nSites57326.diffs [Diffs::initialize] Done.
Initialize EEMS random state
[EEMS::initialize_state] EEMS starts with 30 qtiles and 25 mtiles [EEMS::initialize_state] Done.
Input parameters: datapath = ../data/qfly-nIndiv158-nSites57326 mcmcpath = ../data/qfly-nIndiv158-nSites57326-EEMS-nDemes200-chain1 prevpath = gridpath = distance = euclidean diploid = 0 nIndiv = 158 nSites = 57326 nDemes = 200 seed = 1518919729 numMCMCIter = 2000000 numBurnIter = 1000000 numThinIter = 9999 negBiSize = 10 negBiProb = 0.670000 qVoronoiPr = 0.250000 mrateShape = 0.000500 qrateShape = 0.002000 sigmaShape = 0.001000 qrateScale = 0.500000 mrateScale = 2.000000 sigmaScale = 1.000000 mSeedsProposalS2 = 0.010000 qSeedsProposalS2 = 0.100000 mEffctProposalS2 = 0.100000 qEffctProposalS2 = 0.001000 mrateMuProposalS2 = 0.010000
Initial log prior: -32923.86 Initial log llike: inf