abacus-gene / paml

PAML is a program package for model fitting and phylogenetic tree reconstruction using DNA and protein sequence data. Please report only **technical issues** on this repository (e.g., compiling, programs abort or do not run at all, etc.). Problems with input data and general questions should be posted at https://groups.google.com/g/pamlsoftware?pli
GNU General Public License v3.0
103 stars 19 forks source link

dS > 1 #16

Closed lalalagartija closed 11 months ago

lalalagartija commented 2 years ago

Hi, I weird results with a dS of 26 for a neutral model. How can this be possible ? Isn't the dS supposed to be the pS devided by the number of positions where a sunonymous mutation is possible ?

Thanks,

sabifo4 commented 11 months ago

Hi there,

Please use the PAML Google group to discuss and ask questions about PAML programs given that this repository is meant to deal with technical problems users may encounter when running such programs.

With regards to your question, note the following as per Ziheng's message on the Google group in 2022 (see full discussion thread):

t is branch length, expected number of nucleotide substitutions per codon
N: the number of nonsynonymous sites.
S: the number of synonymous sites.  N and S are calculated using sequence length, kappa, and codon frequencies.  see chapter 2 in yang 2014.
dN/dS: the w ratio.  this is the MLE of the parameter in the model.  i think you are using a branch model so a few ratios (w0, w1, w2, ...) are estimated and some branches have the same ratio.
dN: the expected number of nonsynonymous substitutions per nonsynonymous site
dS: the expected number of synonymous substitutions per synonymous site.  dN and dS are calculated using the branch length, kappa, w, codon frequencies.  agains see chapoter 2 in yang 2014.
N*dN: this gives the (predicted) number of nonsynonymous substitutions for the whole gene sequence along the branch
S*dS: : the (predicted) number of synonymous substitutions for the whole gene sequence along the branch

Without input datasets, control files, or output files you got is hard to guess what may have happened when you ran the software. Make sure that you attach these files in the future to help troubleshoot :)

Cheers!