Closed cjcarlson closed 3 years ago
Hi Colin. Thanks for your appreciation!
The double-centering operation in cmdscale
is done by a C code which can be called directly with the :::
operator (not usable in a package!). Here's a simple example with 5 random variables (after loading the code of centre
in R since it is internal to pcoa
):
R> n <- 5L
R> x <- rnorm(n)
R> d <- as.matrix(dist(x))
R> ddc1 <- .Call(stats:::C_DoubleCentre, d)
R> ddc2 <- centre(d, n)
R> ddc1 - ddc2
1 2 3 4 5
1 -1.110223e-16 -5.551115e-17 0.000000e+00 0.000000e+00 -2.775558e-17
2 -5.551115e-17 0.000000e+00 5.204170e-18 -7.806256e-18 2.775558e-17
3 0.000000e+00 -1.214306e-17 0.000000e+00 -1.387779e-17 -1.734723e-17
4 0.000000e+00 1.387779e-17 2.775558e-17 1.387779e-17 1.127570e-17
5 -2.775558e-17 4.163336e-17 1.214306e-17 1.301043e-18 2.775558e-17
I repeated this code with n = 500
and the matrix returned by centre
is more symmetric than the other one:
R> isSymmetric(ddc1, tol = 1e-15)
[1] FALSE
R> isSymmetric(ddc2, tol = 1e-15)
So I'm not sure the code wriiten by Pierre is an issue. I tried with the data posted by M. Riera on StackOverflow (713 x 713 matrix) and found no error, but cmdscale
returned zero colums, and pcoa
returned a single column with all coordinates zero.
With the data you posted here: it worked for me with both functions. Here is the output from the last lines of your script:
R> isSymmetric(c)
[1] TRUE
R> c[3,1] == c[1,3]
[1] FALSE
R> c[3,1] - c[1,3]
[1] 9.325873e-15
So I'm not sure where's the problem. From your files' encoding it seems you both are under Windows (I'm under Linux), which may be an explanation.
I am indeed on a Windows machine! But I'm pretty sure this is why we're getting complex eigenvalues. If you want to prove it to yourself (on a Windows machine), this is what convinced me (tacks onto the code I posted)
> D.eig <- eigen(c)
> is.complex(D.eig$vectors)
[1] TRUE
> is.complex(D.eig$values)
[1] TRUE
>
> c[lower.tri(c)] = t(c)[lower.tri(c)]
> D.eig <- eigen(c)
> is.complex(D.eig$vectors)
[1] FALSE
> is.complex(D.eig$values)
[1] FALSE
Maybe it's just that the tolerance on my machine is different? But it's doing the exact calculation far enough out that this happens
If I add this to the pcoa function:
centre <- function(D, n) {
One <- matrix(1, n, n)
mat <- diag(n) - One/n
mat.cen <- mat %*% D %*% mat
mat.cen[lower.tri(mat.cen)] <- t(mat.cen)[lower.tri(mat.cen)]
return(mat.cen)
}
I can run the function correctly! So I'm 98% sure this is it
I think this has something to do with LAPACK (used by eigen()
and maybe %*%
). Under Ubuntu, R depends on pre-installed LAPACK libraries, whereas under Windows these are delivered with R. This can explain some tiny differences.
Your suggestion is a very good idea: I'm pushing a new version here in a moment.
Thanks!
Hi there!
I love the package, and am still learning some of the functionality.
I think I've found a rounding error. It addresses the underlying issue in
The centre() function internal to ape::pcoa() returns non-symmetric matrices due to rounding errors. centre() re-centers the matrix but the sigfigs aren't quite conserved through the whole matrix somehow? As a result, you can still get complex eigenvalues from a matrix that, when it enters the function, is symmetric and non-negative (which shouldn't be possible in principle but is a result of the transformation that calls centre).
Attached is a zip with a reproducible R script and .RDS so you can check this out. I think fixing the rounding error is probably pretty easy? I'm very excited to dig deeper into this.
Thanks so much!
APE.zip