abacus-gene / paml

PAML is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. Please report only **technical issues** on this repository (e.g., compiling, programs abort or do not run at all, etc.). Problems with input data and general questions should be posted at https://groups.google.com/g/pamlsoftware?pli
GNU General Public License v3.0
122 stars 20 forks source link

set model for amino acid sequence in mcmctree #19

Closed Mirror1211 closed 1 year ago

Mirror1211 commented 2 years ago

Hello, I tried to use mcmctree to estimate the divergece time among several species, but I counldn't find the tutorials concering the model setting for amino acid sequence. The best-fit model for my data is JTT+G4+F, which was detected using modeltest-ng. How should I set the model in ctl file?

      seed = -1
   seqfile = 125_single_copy_gene.phy
  treefile = input_tree.txt
  mcmcfile = mcmc.txt
   outfile =representative_approx

     ndata = 1
   seqtype = 2  * 0: nucleotides; 1:codons; 2:AAs
   usedata = 3    * 0: no data; 1:seq like; 2:use in.BV; 3: out.BV
     clock = 2    * 1: global clock; 2: independent rates; 3: correlated rates
   RootAge = <2000  * safe constraint on root age, used if no fossil for root.

     _model = 0    * 0:JC69, 1:K80, 2:F81, 3:F84, 4:HKY85_ **(How should I set this model?)**
     alpha = 0.5    * alpha for gamma rates at sites
     ncatG = 5    * No. categories in discrete gamma

 cleandata = 0    * remove sites with ambiguity data (1:yes, 0:no)?

   BDparas = 0.1 0.1 0.1    * birth, death, sampling

kappa_gamma = 6 2 gamma prior for kappa alpha_gamma = 1 1 gamma prior for alpha rgene_gamma = 2 2000 1 gamma prior for overall rates for genes sigma2_gamma = 1 10 1 gamma prior for sigma^2 (for clock=2 or 3) finetune = 1: 0.1 0.1 0.1 0.1 0.1 0.1 * auto (0 or 1) : times, musigma2, rates, mixing, paras, FossilErr print = 1 burnin = 3000000 sampfreq = 100 nsample = 100000

*** Note: Make your window wider (100 columns) before running the program.

Thanks.

Mirror1211 commented 2 years ago

Furthermore, when I used Approximate likelihood method for protein data, should I remove out.BV and rst or remove out.BV, rst, rst1 and rst2 in the second step?

sabifo4 commented 1 year ago

Hi there,

You can follow the fourth tutorial (section "Tutorial 4") available in the MCMCtree tutorial. Please make sure that you include option aaRatefile with the (absolute/relative) path to the file that has the JTT matrix -- you can find all these matrices in the dat directory. In addition, please make sure that the settings you have specified in the control file do indeed reflect your knowledge about the data (i.e., do not use default values used by other researchers with other datasets). Also, please note that this type of question should be posted in the PAML discussion group.

Closing now this issue now as there are no technical problems with MCMCtree.