emmanuelparadis / psmcr

R Port of psmc
Other
12 stars 4 forks source link

Clarification on mutation rate, generation time, and what is output on the x-axis of a PSMC plot #11

Closed sjfleck closed 6 months ago

sjfleck commented 1 year ago

I noticed that the plot command has some text that reads:

The value given to ‘mutation.rate’ may be either the mutation rate per year, in which case ‘g’ may need to be changed, or the mutation rate per generation, in which case ‘g’ may be left equal to one.

For the species I'm working with, I'm using a per-generation mutation rate of 7.5*10^(-9) and a generation time of 15 years

Just to be clear, when I'm plotting PSMCR, I can use: mutation.rate = 0.5e-9, g = 15

or: mutation.rate = 7.5e-9, g = 1

and in this case, is the x-axis in generations or in years. When I look at different publications that report PSMC, mostly they report years, but sometimes they report generations and I'm having a difficult time finding exactly what they're doing differently in their methods. It's typically something like 'A mutation rate of X nucleotide/year and a generation time of Y years were used to convert the demographic time to years for both species'

Is there an extra conversion that I need to do, or are they just converting nucleotide/year to nucleotide/generation using the generation time for input into PSMC?

I was under the impression that you entered the per-generation mutation rate and the generation time, which is different from what is written in the usage I shared above. Any help with this would be greatly appreciated

emmanuelparadis commented 1 year ago

The basic formula for theta is: theta = 2 mu N where mu is the mutation rate and N is the number of alleles in the population. It makes no assumption about the demography so mu is per generation (i.e., the rate at which new alleles are introduced in the population). The PSMC deals only with polyploids, so we may multiply this expression by 2 if N is the number of individuals in the population. The transformation used in psmcr under scaled = FALSE is: N = theta / (4 mu g) so if mu is expressed by year (yr^-1), g must be expressed in years to have consistent units.

sjfleck commented 1 year ago

I'm trying to make sure that I'm following. I'm just worried I entered the mutation rate incorrectly. When I ran PSMCR, I was using the default option, "scaled = false." It seems like I was using the first method of rescaling.

For those earlier runs, I was using a per-generation mutation rate and inputting the generation time in years. For example: mutation.rate = 7.5e-9, g = 15

From my understanding of PSMC, specifying the per-generation mutation rate and generation interval in years were required for a an accurate x-axis of "time before present."

From your explanation above, have I misunderstood the inputs for PSMC or are they different for PSMCR? Or is it the same with PSMCR, but depends on whether scaled = FALSE or scaled = TRUE?

Thank you for any clarification you have on this.

emmanuelparadis commented 1 year ago

The present R-package (psmcr) is based on the code of PSMC (written in C), so there is no expected difference between them at this level. Besides, the explanations given in ?plot.psmc were copied from PSMC's manual (Heng Li is co-author of this help page and of the package of course).

AFAIK, the quantities computed by PSMC are in terms of theta (= 2 mu Ne) which are independent of the demography, hence the generation time does not enter here (i.e., these quantities are scaled). If scaled = FALSE, the output will not be scaled and the time-scale will be in number of generations (not years). So you can interpret this scale in years if you know the generation time.

The coalescent model requires the mutation rate to be per generation, hence the need to input the generation time if the mutation rate is per year.

Note that if scaled = TRUE, the arguments mutation.rate and g are ignored.