delt0r / msms

A coalsecent simulator with selection
www.mabs.at/ewing/msms
17 stars 7 forks source link

Unexpected behavior with -es and selection #41

Open BenjaminPeter opened 9 years ago

BenjaminPeter commented 9 years ago

When using the es flag to split a population, the allele frequencies of the trajectories are not averaged as I would expect. I.e for the command at the end, A is fixed in population 2 and absent from population 1. Then A receives 1% of immigrants (-es), and the allele frequency becomes ~0.5, when most people would probably expect a weighted mean, i.e an allele frequency of around 0.01

msms 1 1 -r 0 -t 2 -I 1 1 0 
-es 0.01 1 0.1 
-ej 1. 2 1
-SI 0.011 2 0 1
-N 1000 -oTrace > bug                                                                                                     

Edit: OK, I found that weighting is done using population sizes, not split proportions. So, using a split flag of

-es 0.01 1 0.1 -en 0.01 2 0.1

solves the issue, but the behavior is still somewhat surprising, and perhaps could be mentioned in the manual.

delt0r commented 9 years ago

Not sure i follow. In pastward time a split (-es) with a 0.1 chance of lineages moving to the new deme means in forward time 10% of the current population is made up from that deme. This is sampled as well. A join in pastward time (-ej) is a split in forward time and the new population samples from the parent population. In pastward time all lineages are moved out of the deme joined.

On Tue, Jun 9, 2015 at 6:25 PM, BenjaminPeter notifications@github.com wrote:

When using the es flag to split a population, the allele frequencies of the trajectories are not averaged as I would expect. I.e for the command at the end, A is fixed in population 2 and absent from population 1. Then A receives 1% of immigrants (-es), and the allele frequency becomes ~0.5, when most people would probably expect a weighted mean, i.e an allele frequency of around 0.01

msms 1 1 -r 0 -t 2 -I 1 1 0 -es 0.01 1 0.1 -ej 1. 2 1 -SI 0.011 2 0 1 -SA 1000 -N 1000 -Sc 0 1 0 -oTrace -SFC > bug

— Reply to this email directly or view it on GitHub https://github.com/delt0r/msms/issues/41.

I have no special talents. I am only passionately curious. --Albert Einstein

BenjaminPeter commented 9 years ago

OK, I try to spell out my expectation for the split (-es), to hopefully make things clear:

Backward in time:

Population 1 splits into two population X and Y. Lineages in pop 1 at the split should pick deme X with probability 0.1, deme Y with probability 0.9. So if the current frequency at the split in population 1 is p_1, the initial frequencies in populations X and Y should also be p_1, since we expect lineages to pick their ancestral deme independent of their genotype.

Forward in time

Populations X and Y merge into population 1, and population 1 is made up of 90% individuals from Y and 10% individuals from X. Now, at the split time, the frequency of the selected allele in X is p_x=1 and in Y it is p_y=0. So if we pick a random individual right after the split, I would expect it to be with probability p_x * 0.1 + p_y * 0.9 a carrier of the selected allele, so the allele frequency should be 10%.

Now, obviously these models for the frequency differ from each other, however as trajectories are simulated forward in time, I was expecting the mixing I described. Currently, the mixing proportion is p_x * N_x + p_y * N_y, where N are the sizes of X and Y, which was not my first guess. Changing the population sizes immediately before the -es event to the mixture proportions is a workaround, but it might be nice to document that behavior somewhere in the docs.