emmanuelparadis / ape

analysis of phylogenetics and evolution
http://ape-package.ird.fr/
GNU General Public License v2.0
52 stars 11 forks source link

possible bug in ace #63

Open RubyRedlich opened 2 years ago

RubyRedlich commented 2 years ago

Hello, I think I may have found a bug in ace. This bug comes up when calculating ancestral likelihoods with type = "discrete" and marginal = FALSE.

I believe the bug occurs in the following line of code from the second pass of the double pass algorithm:

tmp <- lik.anc[anc, ] / (lik.anc[des1, ] %*% P)

I noticed that the dot product in the denominator uses the probability of transitioning from the state at the descendant node to the state at the ancestral node. However, I think it should use the probability of transitioning from the state at the ancestral node to the state at the descendant node.

Please correct me if I am wrong, but my reasoning is that the dot product used above is the opposite direction of the dot product in the forward pass, and though the direction of traversal through the tree changes, I don't think the dot product should be flipped because the direction of evolution is the same. This would only impact the results when the rate model is asymmetrical (e.g. model = "ARD").

I modified the line of code by changing the direction of the dot product and taking the transpose, as follows: tmp <- t(lik.anc[anc, ] / (P %*% lik.anc[des1, ]))

I tested this change on a very small tree for which I calculated ancestral likelihoods (summing over all possible states at every other node) by brute force to obtain an exact answer. With the small modification I made, ace was able to reproduce the exact answer whereas before it was slightly off.

Additionally, writing out the math that the code is doing, I believe that flipping the direction of the dot product gives an equation that can be rearranged to give the exact answer for the ancestral likelihood from Yang et. al. (1995).

Thank you so much. Please let me know if you think I have made any mistakes or misinterpretations.

emmanuelparadis commented 1 year ago

Hello, First, I guess that what you mean by "dot product" is actually the "matrix product". The dot product (i.e., sum(x * y) in R) is commutative, so there would be no issue with swapping the terms. In general, the matrix product is not commutative but in the present case lik.anc[des1, ] is a vector with p elements (p is the number of states of the character) and P is a matrix with p rows and p columns. So the product calculated with %*% gives the same values either in a 1-by-p matrix or in a p-by-1 matrix (i.e., they are the transpose of each other). This comes from the equality:

(XY)' = Y'X'

As a side note, R treats a vector either as a 1-column matrix or as a 1-row matrix depending on the order of the terms in a matrix product:

R> (P <- matrix(c(.9, .1, .1, .9), 2))
     [,1] [,2]
[1,]  0.9  0.1
[2,]  0.1  0.9
R> x <- c(.1, .9)
R> x %*% P
     [,1] [,2]
[1,] 0.18 0.82
R> P %*% x
     [,1]
[1,] 0.18
[2,] 0.82

If you observed differences, they could be caused by the order of terms. Here's a small example based on the above equality:

R> p <- 9
R> X <- matrix(runif(p^2), p)
R> Y <- matrix(runif(p^2), p)
R> mean((t(Y) %*% t(X)) - t(X %*% Y))
[1] -1.918904e-17

Generally if p is odd there is this kind of deviation, and for p even > 60. But if you observe more substantial differences, please report them. Cheers, PS: thanks for your patience!

RubyRedlich commented 1 year ago

Hi, thank you! Yep, I did mean matrix product, not dot product, sorry! I think I was more wondering about the case when P is not symmetrical (which I think can happen when the rate model is "ARD").

lik.anc[des1, ] %*% P

If in the above code lik.anc[des1, ] is X and P is Y, then I think I had changed it from XY to (YX)' = X'Y' which would only be the same as XY when Y = Y' or when P is symmetric.

Thanks again!

emmanuelparadis commented 1 year ago

Hi, You're right: I didn't pay enough attention to the fact that you were referring to the ARD model. Looking at the code I'm not sure of the reason of the order of the terms in lik.anc[des1, ] %*% P since this implies that the likelihood values will be multiplied by the columns of P. Indeed, the columns of P do not sum to one unless it is symmetric (the rows always sum to one). So your suggestion makes sense. I wonder if the same correction should not be applied to the next line:

lik.anc[des1, ] <- (tmp %*% P) * lik.anc[des1, ]
RubyRedlich commented 1 year ago

Hi, thank you! I don't think the correction needs to be applied to the next line. Since in this line, you are updating the elements of the vector lik.anc[des1,] based on the value of lik.anc[anc,], I think it is correct to multiply by the columns of P. This way the value of lik.anc[des1, i] (the likelihood that des1 is in state i) is based off column i of P which represents the probability of transitioning from different possible states at the anc node to a fixed state i of des1.

In the previous line for which I suggested the correction, each element of tmp, tmp[i], corresponds to lik.anc[anc, i] since lik.anc[anc,] is in the numerator. However, it is using the same P matrix in which the states in the row correspond to the state of the anc and the states in the column correspond to the states of the des1. (P[i,j] is the probability of transitioning from state i at the anc to state j at the des1). This is why I think it makes sense to multiply by the rows. This would be like fixing the anc in state i and summing over the different possible states of des1.

Sorry for any confusion and please correct me if you have a different understanding! Thank you!

emmanuelparadis commented 1 year ago

You are correct (I tried to remove any doubt in my mind replacing tmp %*% P by P %*% t(tmp) but this does not give anything making sense).

I tried your fix (from your first post) with the data in ?ace and setting model="ARD". The output is very close to what the current ace() can return: the rates, their SEs, and the log-lik are identical; on the other hand, 4 values in $lik.anc are now NaN and the others are close to what is obtained with the current version in ape. To avoid confusion, I changed this line:

https://github.com/emmanuelparadis/ape/blob/9ce7240d2132d9efa92a9e88822b8ba8743bcba8/R/ace.R#L283 to this one:

tmp <- t(lik.anc[anc, ] / (P %*% lik.anc[des1, ]))

and did the same on line 290 with des2. The commands were:

data(bird.orders)
x <- factor(c(rep(0, 5), rep(1, 18)))
ans <- ace(x, bird.orders, type = "d", model = "ARD")
ans$lik.anc

Is that expected?

RubyRedlich commented 1 year ago

Hi, sorry for the late reply. I stepped through the example to see where the NaNs are coming from. One of the rates in Q is 0, so for some branches one of the elements of P is 0. When lik.anc[des1,] is a tip, one of it elements is 1 and the other is 0. This causes one of the elements of (P %*% lik.anc[des1, ])) to be 0, and division by 0 when computing tmp leads to NaN. I wasn't sure whether to interpret this as a bug in my suggestion or as a reflection that the transition rate in Q should not be 0. (e.g. Using anova to compare rate models suggests that ER is a better model for this data). I hope this helps! Best, Ruby