delt0r / msms

A coalsecent simulator with selection
www.mabs.at/ewing/msms
17 stars 7 forks source link

-ej events disturb starting frequency set by -SI ... -Smark issue? #26

Closed leslem closed 10 years ago

leslem commented 11 years ago

I've been trying to simulate a model of selection based on African, Asian, and European populations (populations 1, 2, and 3), including bottlenecks and population expansions. However, I've been having trouble introducing selection in population 2. As long as I introduce selection before (pastwards) the population join (-ej 0.005 2 3), the model works fine and produces higher derived allele frequencies at the selected site in the asian population, as expected.

This command works just fine:

good-model:

java -Xmx3G -Xms3G -jar msms.jar 300 20000 -t 0.600000 -r 0.895104 1000 -I 3 100 100 100 -ej 0.005 2 3 -ej 0.00875 3 1 -en 0.0005 1 0.24 -en 0.000875 3 0.077 -en 0.001 2 0.077 -en 0.001 3 0.0125 -en 0.001125 3 0.077 -en 0.00475 2 0.00373 -en 0.004875 2 0.077 -en 0.0075 1 0.03125 -en 0.007625 1 0.24 -en 0.0085 3 0.00294 -en 0.008625 3 0.077 -en 0.0425 1 0.12 -Smark -N 10000 -SI 0.004650 3 0.250000 0.250000 0.250000 -Sc 0 2 2000.000000 1000.000000 0.000000 -Sp 0.500000 -oOC -Smu 0.000600 -oFP 0.0000

But when I try to introduce selection in any population after (pastwards) the population join, the allele frequencies at the selected site don't turn out right. Even though I specify that all three populations should have a starting frequency of 0.25, population 2 ends up having no copies of the beneficial allele. This is the case even when I introduce selection in population 2 itself.

Considering the model going forward in time, when population 3 splits into populations 2 and 3, population 2's initial frequency of the beneficial allele should be the same as that of population 3 at the splitting time. But it appears that population 2's beneficial allele frequency is instead starting at 0. This seems like it might be related to the (closed) issue https://github.com/delt0r/msms/issues/5, and probably also to the ongoing issues with -Smark.

Here are the three models that demonstrate the issue:

model1

java -Xmx3G -Xms3G -jar msms.jar 300 20000 -t 0.600000 -r 0.895104 1000 -I 3 100 100 100 -ej 0.005 2 3 -ej 0.00875 3 1 -en 0.0005 1 0.24 -en 0.000875 3 0.077 -en 0.001 2 0.077 -en 0.001 3 0.0125 -en 0.001125 3 0.077 -en 0.00475 2 0.00373 -en 0.004875 2 0.077 -en 0.0075 1 0.03125 -en 0.007625 1 0.24 -en 0.0085 3 0.00294 -en 0.008625 3 0.077 -en 0.0425 1 0.12 -Smark -N 10000 -SI 0.0084 3 0.250000 0.250000 0.250000 -Sc 0 2 2000.000000 1000.000000 0.000000 -Sc 0.00051 3 2000.000000 1000.000000 0.000000 -Sp 0.500000 -oOC -Smu 0.000600 -oFP 0.0000 -oFP 0.0000

model2

java -Xmx3G -Xms3G -jar msms.jar 300 20000 -t 0.600000 -r 0.895104 1000 -I 3 100 100 100 -ej 0.005 2 3 -ej 0.00875 3 1 -en 0.0005 1 0.24 -en 0.000875 3 0.077 -en 0.001 2 0.077 -en 0.001 3 0.0125 -en 0.001125 3 0.077 -en 0.00475 2 0.00373 -en 0.004875 2 0.077 -en 0.0075 1 0.03125 -en 0.007625 1 0.24 -en 0.0085 3 0.00294 -en 0.008625 3 0.077 -en 0.0425 1 0.12 -Smark -N 10000 -SI 0.008400 3 0.250000 0.250000 0.250000 -Sc 0 3 2000.000000 1000.000000 0.000000 -Sp 0.500000 -oOC -Smu 0.000600 -oFP 0.0000

model3

java -Xmx3G -Xms3G -jar msms.jar 300 20000 -t 0.600000 -r 0.895104 1000 -I 3 100 100 100 -ej 0.005 2 3 -ej 0.00875 3 1 -en 0.0005 1 0.24 -en 0.000875 3 0.077 -en 0.001 2 0.077 -en 0.001 3 0.0125 -en 0.001125 3 0.077 -en 0.00475 2 0.00373 -en 0.004875 2 0.077 -en 0.0075 1 0.03125 -en 0.007625 1 0.24 -en 0.0085 3 0.00294 -en 0.008625 3 0.077 -en 0.0425 1 0.12 -Smark -N 10000 -SI 0.008400 3 0.250000 0.250000 0.250000 -Sc 0 1 2000.000000 1000.000000 0.000000 -Sp 0.500000 -oOC -Smu 0.000600 -oFP 0.0000

I've posted ms format output files for each of these models for your viewing here: https://www.dropbox.com/sh/kgv26cqkd5cor8k/hesmu0CnAn

You will also find .png files that contain boxplots showing the distribution of beneficial allele frequencies in each population for each model.

delt0r commented 10 years ago

A fix is in the pipeline.

Please not that there is few errors in the command lines. Namely you specify frequency of the selected alleles in populations that don't exist at that time. It should warn or raise an error, but currently does not.