NBISweden / MrBayes

MrBayes is a program for Bayesian inference and model choice across a wide range of phylogenetic and evolutionary models. For documentation and downloading the program, please see the home page:
http://NBISweden.github.io/MrBayes/
GNU General Public License v3.0
224 stars 78 forks source link

how to set the "TVM+F+I+R3" model? #282

Open liamxg opened 11 months ago

liamxg commented 11 months ago

@pontus @viklund @olas @eryl @msuchard

Zshuyun commented 11 months ago

Hello, I have also encountered the same problem, have you solved it? How should I set up the “JTT+I+G+F” model?

liamxg commented 11 months ago

@Zshuyun sorry, no one reply to me.

Zshuyun commented 11 months ago

Okay, thank you

nylander commented 5 months ago

Dear @liamxg

As for the "TVM"

From the help on lset:

Nst -- Sets the number of substitution types: "1" constrains all of
       the rates to be the same (e.g., a JC69 or F81 model); "2" all-
       ows transitions and transversions to have potentially different
       rates (e.g., a K80 or HKY85 model); "6" allows all rates to
       be different, subject to the constraint of time-reversibility
       (e.g., a GTR model). Finally, 'nst' can be set to 'mixed', which
       results in the Markov chain sampling over the space of all poss-
       ible reversible substitution models, including the GTR model and
       all models that can be derived from it model by grouping the six
       rates in various combinations. This includes all the named models
       above and a large number of others, with or without name.

For a nt "4-by-4" setup, you specify the number of substitution types with lset nst=, choosing one of the options 1, 2, 6, or Mixed. Setting nst=1 means AC=AG=AT=CG=CT=GT, and nst=6 AC,CG,AT,GT,AG,CT. Using nst=2 will set AC=AT=CG=GT,AG=CT. "TVM" would be AC,CG,AT,GT,AG=CT, but you can not specify this specific rate configuration in MrBayes (no nst=5 for example).

However, one may try to "emulate" a TVM model, by setting lset nst=6, then use the prset command to change to a highly informative prior for the substitution rates (Revmatpr). From the help on prset:

Revmatpr -- This parameter sets the prior for the substitution rates
            of the GTR model for nucleotide data. The options are:
              prset revmatpr = dirichlet(<number>,<number>,...,<number>)
              prset revmatpr = fixed(<number>,<number>,...,<number>)

               The program assumes that the six substitution rates
               are independent gamma-distributed random variables with the
               same scale parameter when dirichlet is selected. The six
               numbers in brackets each corresponds to a particular substi-
               tution type. Together, they determine the shape of the prior
               The six rates are in the order A<->C, A<->G, A<->T, C<->G,
               C<->T, and G<->T. If you want an uninformative prior you can
               use dirichlet(1,1,1,1,1,1), also referred to as a 'flat'
               Dirichlet. This is the default setting. If you wish a prior
               where the C<->T rate is 5 times and the A<->G rate 2 times
               higher, on average, than the transversion rates, which are
               all the same, then you should use a prior of the form
               dirichlet(x,2x,x,x,5x,x), where x determines how much the
               prior is focused on these particular rates. For more info,
               see tratiopr. The fixed option allows you to fix the substi-
               tution rates to particular values.

As for the "+F" and "+R3"

"+F" is probably the syntax used in iqtree2 for applying "Empirically counted frequencies from alignment" when estimating the state frequencies. MrBayes uses MCMC to integrate over all possible state frequencies, and the settings for this can be changed with the prset Statefreqpr command (see output from help prset).

"+R3" is probably the syntax used in iqtree2 for applying "the FreeRate model with 3 categories" for modelling rate heterogeneity among sites. In MrBayes (v3.2.7a), a "FreeRate"-model can be applied by using lset rates=kmixture. See the output from help lset.

Currently, the models in MrBayes (v3.2.7a) are not set up to (easily) combine the +I (or +G) with +Rn.

A related comment

Due to the fact that different software implements different models, some software have made program-specific subsets available for easier comparison (e.g., MrModeltest2, Modeltest-NG, IQ-tree, ...). These can be useful for many purposes.

Yours

Johan