bacpop / trees_rs

Tree inference in rust
GNU Affero General Public License v3.0
1 stars 0 forks source link

Optimising molecular model parameters #3

Open johnlees opened 2 months ago

johnlees commented 2 months ago

Between topology optimisation, want to optimise the rate matrix parameters.

Ultimately we want:

Maybe a helpful reference: https://github.com/4ment/phylostan

Also see models here: http://www.iqtree.org/doc/Substitution-Models#dna-models and http://www.iqtree.org/doc/Substitution-Models#binary-and-morphological-models

jhellewell14 commented 1 month ago

I've made a start on this here: https://github.com/bacpop/trees_rs/commit/362e3dba6faf48001f91083b8d1cee698a51bc7d

Now Trees (the struct) initialise with default rate parameters https://github.com/bacpop/trees_rs/blob/362e3dba6faf48001f91083b8d1cee698a51bc7d/src/rate_matrix.rs#L6-L10 and these rate parameters are used to construct a matrix using the GTR model https://github.com/bacpop/trees_rs/blob/362e3dba6faf48001f91083b8d1cee698a51bc7d/src/rate_matrix.rs#L19-L42

Now the update.likelihood method uses the internal tree rate matrix so we can make changes to the rate paramters, update the matrix, and then update the likelihood.

A problem I might need your help with is the phyML test. If you run the test now it passes, even though the call to phyML specifies for it to use the JC69 model https://github.com/bacpop/trees_rs/blob/362e3dba6faf48001f91083b8d1cee698a51bc7d/tests/phyml.rs#L53 If you swap the JC69 to GTR then the test fails because the likelihoods are different. Any ideas why? As far as I can tell I have constructed the default rate matrix to be the same as the one that phyML uses.

johnlees commented 1 month ago

So phyml with JC69 matches your code with GTR? Or they match when both JC69, and mismatch when both GTR?

Can't immediately spot anything obviously wrong in the rates, but will take a closer look soon

johnlees commented 1 month ago

A rust thing: I wonder if we might want a RateMatrix trait, which then JC69 and GTR structs have impl of

jhellewell14 commented 1 month ago

So phyml with JC69 matches your code with GTR? Or they match when both JC69, and mismatch when both GTR?

Can't immediately spot anything obviously wrong in the rates, but will take a closer look soon

I think what I don't quite understand is why this should change the likelihood at all. GTR vs JC69 are ways of parameterising the matrix right? So if I choose parameters in each model that produce a matrix with the same values then the likelihood of the tree should be the same?

Before I was explicitly defining a rate matrix q that equals

(-1, 1/3, 1/3, 1/3, 
1/3, -1, 1/3, 1/3,
.... and so on

Then I calculate the tree likelihood as normal by exponentiating this rate matrix. This produce a likelihood that matched with phyML set to JC69 mode so I'd imagine phyML uses this as the default (I think this corresponds to the 1 parameter in JC69 = 4/3 ?).

Now I've added a function so that we no longer define q separately and it produces a matrix in the Tree struct that has the exact same value as q (by having specific default values for the a, b, c, d, e, f etc parameters in the GTR model). So my code would produce the same likelihood.

However, when you set phyML to use GTR it produces a different likelihood to my code. So possibly it just has a different default set of GTR parameters that produce a different rate matrix and therefore different likelihood.

jhellewell14 commented 1 month ago

A rust thing: I wonder if we might want a RateMatrix trait, which then JC69 and GTR structs have impl of

Implemented this here: https://github.com/bacpop/trees_rs/blob/main/src/rate_matrix.rs

jhellewell14 commented 1 month ago

Shout out @mattjohnruss