Closed dschrempf closed 6 years ago
@dschrempf : Is this feature now fully implemented? (just to be sure, so that I can close the issue).
This list contains many features, and some have not been implemented yet. I should have created separate issues. I close this now, and reopen issues that have not been addressed yet.
The files and line numbers refer to branch
pomo_latest
, commit23aa0965
Amend gitignore. Change output, max branch length.
Enhancements and TODOs for PoMo.
Testing. I changed quite a bit of core code in PoMo which has to be thoroughly tested. We will do this when we run simulations for the application note that we are planning.
Output. The output needs to be improved.
Decomposition of the rate matrix. The function performing the eigendecomposition requests the rate matrix but also recomputes the rate matrix. This leaves room for speed improvements.
EigenDecomposition::eigensystem_sym()
expects amatrix[][]
object with two indices. However, it is not used, becauseModelPoMo::computeRateMatrix()
is called anyways from withineigensystem_sym()
.ModelFinder. Implement model finder. I must admit I did not look into this much until now.
Mixture models and gamma rate heterogeneity. A mixture model is used to handle Gamma rate heterogeneity at the moment. Hence, it is not possible to combine mixture models and gamma rate heterogeneity at the moment, a restriction that could be removed.
Things to keep in mind.
Maximum genetic distance. The distance measure of PoMo includes not only mutations (which can be compared to substitutions) but also frequency shifts. Hence, the branch lengths of PoMo are expected to be longer by a factor of
N*N
. This clashed with constants defined, e.g., intools.h
I had to manually recompute
MAX_GENETIC_DIST
whenever it is used, because during maximization of the likelihood, branch lengths were limited. This should be working now, but it is important to keep in mind when changing code in these areas. Maybe, a model dependent maximum distance may make sense.Further TODOs (minor priority).
Do not temper with Params. The problem is the following: when running IQ-TREE with PoMo, certain parameters need to be known already when reading in the alignment file (counts file at the moment). This is, the virtual population size
N
which affects the data structure and the sampling method (sampled or weighted). At the moment, those flags are stored inModelPoMo
upon model creation and also in theParams
class. It may be advantageous to refrain from using theParams
class to store model parameters.Verbosity. The output with increased verbosity is too long.
No polymorphic data. At the moment, it is not possible or at least it leads to undefined behavior when PoMo is run on data without polymorphisms. Either, the level of polymorphism should be fixed, or an clear error message should be emitted.
R and Phi matrices. IQ-TREE and PoMo is now aware of the symmetric mutation rate matrix
R
and the skew-symmetric mutation matrixPhi
(refer to our most recent publication, Schrempf, Hobolth 2017). These matrices are unnecessarily dragged along during the maximization of the likelihood but really only need to be computed at the end when printing output.Theta and sampling method sampled. The calculation of
theta
(Watterson's theta or level of polymorphism) is necessarily faulty, especially whenN
is low. I am not sure how to tackle this because the problem is of statistical nature.Support alignments with differences only. It may be good to support alignments that only contain sites with differences but omit sites that are the same in all species. I am not sure how many people will have data like that.