delt0r / msms

A coalsecent simulator with selection
www.mabs.at/ewing/msms
17 stars 7 forks source link

Unexpected non-binary allelic state at selected locus #34

Closed ghost closed 10 years ago

ghost commented 10 years ago

Hi!

If I run a simple model with strong selection that starts 400 generations ago and is limited to population 2 (of 3):

-ms 80 1 -I 3 0 80 0 0 -t 1489.200000 -r 876.000000 100000 -N 7300.000000 -SI 0.013699 3 0 0.001000 0 -Sc 0.000000 2 1460.000000 730.000000 0 -Sp 0.5 -Smark -oTrace -seed 0x68f45cc0e5682c94 -oOC

I get alleleic states '1'/'2'/'3' at the selected position 0.5 rather than the expected all '1's (the trace has the selected allele fixing). -oOC gives 3 origins, and the haplotypes defined by variant sites around position 0.5 depend strongly on the allele state at 0.5, supporting this. Recurrent mutations appear to be on, but I can't see where I'm doing that - so I just thought I'd report incase there was anything unusual going on behind the scenes (or for advice if it is a simple error on my part!).

Otherwise, thanks for the nice program, it has been very useful!

Best,

Guy

ghost commented 10 years ago

I think I partly understand this now... I had probably misunderstood the way in which -SI works. Increasing the starting frequency of the selected allele to eg -SI 0.013 3 0 0.2 0 leads to an origin count of 100s, and a corresponding large number of alleles at the selected site. I was thinking that the -SI flag acted by conditioning on a frequency and time in the past, with neutral behaviour pastward at the site from that time. But I now wonder if it's doing something different - maybe introducing the allele at the stated frequency by adding it randomly to chromosomes in the population at that stated time? Or something more complicated...

Anyway, I've left open for now as I'm still interested in how -SI works and it might be useful for people to be aware that interference between selected haplotypes could happen with this option. But feel free to close if this is expected behaviour!

delt0r commented 10 years ago

The -SI option sets the "initial" conditions. That is what the frequency of the allele when we start to consider selection. Before this point there is no tracking of the Allele so that everything at that point is a "origin".

So if I have 10 individuals that have the allele at 200 generations in the past. The i can have up to 10 "origins". After the 200th generation, we ignore the allele and just do neutral simulations. Unfortunately this is the only way to do this without running the simulation further back, also then you can't really condition on the frequency of the allele at a particular time really. At least with this version. Generally condition on the frequency at a given time is not easy except for a few edge cases.

Hope this helps.