adamallo / SimPhy

SimPhy: A comprehensive simulator of gene family evolution
GNU General Public License v2.0
26 stars 1 forks source link

Rate heterogeneity #6

Closed dschrempf closed 6 years ago

dschrempf commented 6 years ago

Hi, sorry I fail to send emails to the google group simphy@googlegroups.com, so I post my issue here.

I would like to simulate gene trees for a specific species tree (no duplication, loss, just one locus tree) with substitution rate heterogeneity. I.e., different genes should evolve with different rates.

I have trouble interpreting the flags determining rate heterogeneity. For example,

-hl f:1.0

This would set the alpha parameter of the gamma distribution which is used to pick rates for the locus trees to 1.0. Is this correct? However, this is not what I want, because then all my gene trees would have the same rate modifier, as I only use one species and one locus tree.

-hg f:1.0

According to the paper, this is gene-by-lineage-specific substitution rate heterogeneity, where rates for different branches of the gene trees are drawn from a gamma distribution with alpha=1.0. This is also not what I need.

I would need a flag where the rates and hence, the coalescent rates, for different gene trees are drawn from a gamma distribution. This is important, because the probability of incomplete lineage sorting is affected by the mutation rate, and it is not sufficient to scale the gene trees alone, because this would not affect ILS.

What I also thought about, is to simulate lets say a 1000 locus trees without duplication and loss but with the hl flag.

I also saw that there is a -hh flag, but I do not understand what it is used for.

Thank you very much for your help!

Kind regards, Dominik

adamallo commented 6 years ago

Dear Dominik,

If I understand your problem correctly, the easiest way of doing this in SimPhy is just simulating 1000 locus trees (-RL f:1000) and then using -HL to specify the gamma you want. If all locus-tree event rates are fixed to 0 (by default), these 1000 locus trees will be identical copies of the species tree but for which gene-family (locus tree) parameters are sampled (e.g., HL to modify the gene-wide mutation rate). What made you discard this option?

I hope this helps, Diego M.

dschrempf commented 6 years ago

Hi Diego,

thank you for your reply.

Diego Mallo notifications@github.com writes:

Dear Dominik,

If I understand your problem correctly, the easiest way of doing this in SimPhy is just simulating 1000 locus trees (-RL f:1000) and then using -HL to specify the gamma you want. If all locus-tree event rates are fixed to 0 (by default), these 1000 locus trees will be identical copies of the species tree but for which gene-family (locus tree) parameters are sampled

Thank you, this is what I most likely need to do (see below).

(e.g., HL to modify the gene-wide mutation rate). What made you discard this option?

I thought, that -hl means one rate per locus tree. I need to sample one rate for each gene family (but in my case all gene families have the same species and the same locus tree).

Best, Dominik

I hope this helps, Diego M.

adamallo commented 6 years ago

Just for clarification, for SimPhy, locus tree and gene family are synonyms. Please, let me know if the RL/HL combination works for your purpose!

Best, Diego M.

dschrempf commented 6 years ago

Hi, thank you. Now I am using

-rl f:1000 and -hl f:alpha

I am aware that gene family trees are locus trees. I thought that I could get different rate modifiers for my genes families without having to use, e.g., 1000 locus trees (because all of my locus trees are the same). This produces some overhead, but it is still very fast!

Thanks, Dominik