Open johnlees opened 2 months ago
I've made a start on this here: https://github.com/bacpop/trees_rs/commit/362e3dba6faf48001f91083b8d1cee698a51bc7d
Now Trees (the struct) initialise with default rate parameters https://github.com/bacpop/trees_rs/blob/362e3dba6faf48001f91083b8d1cee698a51bc7d/src/rate_matrix.rs#L6-L10 and these rate parameters are used to construct a matrix using the GTR model https://github.com/bacpop/trees_rs/blob/362e3dba6faf48001f91083b8d1cee698a51bc7d/src/rate_matrix.rs#L19-L42
Now the update.likelihood method uses the internal tree rate matrix so we can make changes to the rate paramters, update the matrix, and then update the likelihood.
A problem I might need your help with is the phyML test. If you run the test now it passes, even though the call to phyML specifies for it to use the JC69 model
https://github.com/bacpop/trees_rs/blob/362e3dba6faf48001f91083b8d1cee698a51bc7d/tests/phyml.rs#L53
If you swap the JC69
to GTR
then the test fails because the likelihoods are different. Any ideas why? As far as I can tell I have constructed the default rate matrix to be the same as the one that phyML uses.
So phyml with JC69 matches your code with GTR? Or they match when both JC69, and mismatch when both GTR?
Can't immediately spot anything obviously wrong in the rates, but will take a closer look soon
A rust thing: I wonder if we might want a RateMatrix
trait, which then JC69
and GTR
structs have impl of
So phyml with JC69 matches your code with GTR? Or they match when both JC69, and mismatch when both GTR?
Can't immediately spot anything obviously wrong in the rates, but will take a closer look soon
I think what I don't quite understand is why this should change the likelihood at all. GTR vs JC69 are ways of parameterising the matrix right? So if I choose parameters in each model that produce a matrix with the same values then the likelihood of the tree should be the same?
Before I was explicitly defining a rate matrix q
that equals
(-1, 1/3, 1/3, 1/3,
1/3, -1, 1/3, 1/3,
.... and so on
Then I calculate the tree likelihood as normal by exponentiating this rate matrix. This produce a likelihood that matched with phyML set to JC69 mode so I'd imagine phyML uses this as the default (I think this corresponds to the 1 parameter in JC69 = 4/3 ?).
Now I've added a function so that we no longer define q
separately and it produces a matrix in the Tree
struct that has the exact same value as q
(by having specific default values for the a, b, c, d, e, f etc parameters in the GTR model). So my code would produce the same likelihood.
However, when you set phyML to use GTR it produces a different likelihood to my code. So possibly it just has a different default set of GTR parameters that produce a different rate matrix and therefore different likelihood.
A rust thing: I wonder if we might want a
RateMatrix
trait, which thenJC69
andGTR
structs have impl of
Implemented this here: https://github.com/bacpop/trees_rs/blob/main/src/rate_matrix.rs
Shout out @mattjohnruss
Between topology optimisation, want to optimise the rate matrix parameters.
Ultimately we want:
Maybe a helpful reference: https://github.com/4ment/phylostan
Also see models here: http://www.iqtree.org/doc/Substitution-Models#dna-models and http://www.iqtree.org/doc/Substitution-Models#binary-and-morphological-models