Cibiv / IQ-TREE

Efficient phylogenomic software by maximum likelihood
http://www.iqtree.org
GNU General Public License v2.0
183 stars 44 forks source link

Implement covarion models #68

Open bqminh opened 6 years ago

bqminh commented 6 years ago

Pers. comm. with Jeremy Brown:

Here are a few of the papers I mentioned when we chatted last week:

https://www.sciencedirect.com/science/article/pii/S0025556497000813

https://academic.oup.com/mbe/article/18/5/866/1018678

https://academic.oup.com/mbe/article/19/5/698/1067820

Figure taken from MrBayes manual:

screen shot 2018-05-01 at 4 06 22 pm

Stationary frequencies:

A_off = s_10/(s_10+s_01) pi_A A_on = s_01/(s_10+s_01) pi_A

bredelings commented 10 months ago

Hi, are covarion / markov-modulated substitution models still missing from IQ Tree? These are present in BAli-Phy and RevBayes, and I know that Nicolas Lartillot has also implemented them. If they are still missing from IQ Tree, then I think they are an important model dimension that is missing.

The slide above describes the Tuffley-Steel '98 model, but it is not the only one. There is also the Galtier 2001 model that allows dynamic switching between different categories in a Gamma rate mixture (or a Free rate mixture), and the Huelsenbeck 2002 model that combines the Gamma rate mixture with Tuffley-Steele, and the Wang et al 2007 model that contains the Galtier model, the Tuffley-Steel model, and the Huelsenbeck model as submodels. (I can provide citations if its helpful.)

If these models are still missing, I'm curious what the main implementation difficulty might be. I suspect that the main difficulty might be conceptually separating the Markov chain states from the observed character states. For example, the Tuffley-Steele model above has 8 Markov Chain states, but only 4 observed states.

bredelings commented 10 months ago

BTW, I'm currently investigating these models, and data sets where the rate category for a site is different in different subtrees are not hard to find. All the literature suggests that they are substantially better, so I think the main reason that they are not used very often is probably that they are not implemented in software like IQ Tree.