beast-dev / beast-mcmc

Bayesian Evolutionary Analysis Sampling Trees
http://beast.community
GNU Lesser General Public License v2.1
192 stars 73 forks source link

asymmetric Q matrix normalization extension #1132

Closed jsigao closed 2 years ago

jsigao commented 2 years ago

This PR is modified from a previous PR following @msuchard's suggestion. It contains two additional commits to revert to the default behavior of ComplexSubstitutionModel that (1) uses the explicitly specified frequency vector to normalize the matrix (commit 7f5523b) and (2) require a frequencies or rootfrequencies object to be provided within a ComplexSubstitutionModel block in the XML (commit 0ef84bc). After these changes, only when a newly added argument computeStationary is specified to true (default false) and no frequency vector is provided within the ComplexSubstitutionModel block, the stationary distribution will be computed from the asymmetric Q matrix and be used to normalize the matrix (otherwise the old behavior is untouched and is the default).

For the sake of completeness, the PR message from the previous PR is included below.

This pull request contains three commits that allow a more flexible construction and normalization of a ComplexSubstitutionModel-class Q matrix. The ComplexSubstitutionModel implements an asymmetric Q matrix that is time irreversible (Edwards et al., 2011). This time-irreversible Q matrix can still have a stationary distribution (which is necessarily nonuniform) as long as the matrix is irreducible, but, unlike a GTR model, it seems more sensible and conventional to construct the matrix (at least optionally) without specifying the stationary frequency vector as a constant or stochastic variable.

The first commit (77b201b) modified the normalization behavior of a ComplexSubstitutionModel-class Q matrix so that a stationary distribution is computed from the matrix first and then used to normalize the matrix (instead of normalizing using a constant or stochastic frequency vector specified in the XML, which is necessarily not the stationary distribution as the matrix updates in the MCMC). The core code in this commit for computing the stationary distribution is modified (slightly) from the same function of the MarkovModulatedFrequencyModel class.

The second commit (4c91cf1) fixes a minor issue so that the 'normalized' argument can be passed correctly to the back end. The third commit (b5d7cc8) makes the frequencies (and rootfrequencies) argument optional in the XML.

Please let me know if any edits are needed to these commits. Thank you in advance for reviewing this PR.

jsigao commented 2 years ago

Dear @msuchard -- thank you very much for reviewing and merging this PR! I'll provide the test-units as advised for this and the other PRs as soon as I can.

I am very excited for and look forward to the opportunity to work more closely with you in the near future!