delt0r / msms

A coalsecent simulator with selection
www.mabs.at/ewing/msms
18 stars 7 forks source link

Difficulty with msms result when using reduced subpopulation size #32

Closed StevenVB12 closed 10 years ago

StevenVB12 commented 10 years ago

Dear Dr. Ewing,

Please allow me the following question. I’m working on sequence variation of a gene involved in a dispersal adaptation. To better understand the evolutionary history of this locus, we are using your program MSMS. However, we have come across a result that we find difficult to explain and understand.

Basically, we assume two populations in which an allele evolves and is selected in opposing directions as follows:

./msms 10 1 -N 100000 -t 15 -I 2 5 5 -n 1 1 -ma x 100 100 x -Sc 0 1 1000 -500 -1000 -Sc 0 2 -1000 -500 1000 -Smu 0.01 -SI 1 2 0 0 –Smark (./msms 10 1 -N 100000 -t 15 -I 2 5 5 -n 1 0.01 -ma x 100 100 x -Sc 0 1 1000 -500 -1000 -Sc 0 2 -1000 -500 1000 -Smu 0.01 -SI 1 2 0 0 –Smark)

In this command we vary both the selection time (-SI) and subpopulation size (-n). I wrote a script to parse the ancestral and derived (selected) sequences and calculate a.o. pairwise differences (Pd) between the sequences. Some results are included in the attached graph. As expected, when the selection time increases, Pd in the derived (selected) allele increases, unless the subpopulation size of the population in which the derived allele is selected is reduced. However, what I do not understand is that a small population size of the population in which the allele is selected, also affects the Pd of the ancestral allele. So when I set –n 1 0.01, variation is strongly reduced in the ancestral allele. This is also seen just by the length of the outputted sequences and a short coalescence time. Do you have any idea what could be the reason of this? We would be very grateful for any help.

pd_sim

Best regards, Steven

delt0r commented 10 years ago

Have been away. Will look at this sometime this week. Sorry.

On Wed, Jan 8, 2014 at 1:56 PM, StevenVB12 notifications@github.com wrote:

Dear Dr. Ewing,

Please allow me the following question. I’m working on sequence variation of a gene involved in a dispersal adaptation. To better understand the evolutionary history of this locus, we are using your program MSMS. However, we have come across a result that we find difficult to explain and understand.

Basically, we assume two populations in which an allele evolves and is selected in opposing directions as follows:

./msms 10 1 -N 100000 -t 15 -I 2 5 5 -n 1 1 -ma x 100 100 x -Sc 0 1 1000 -500 -1000 -Sc 0 2 -1000 -500 1000 -Smu 0.01 -SI 1 2 0 0 –Smark (./msms 10 1 -N 100000 -t 15 -I 2 5 5 -n 1 0.01 -ma x 100 100 x -Sc 0 1 1000 -500 -1000 -Sc 0 2 -1000 -500 1000 -Smu 0.01 -SI 1 2 0 0 –Smark)

In this command we vary both the selection time (-SI) and subpopulation size (-n). I wrote a script to parse the ancestral and derived (selected) sequences and calculate a.o. pairwise differences (Pd) between the sequences. Some results are included in the attached graph. As expected, when the selection time increases, Pd in the derived (selected) allele increases, unless the subpopulation size of the population in which the derived allele is selected is reduced. However, what I do not understand is that a small population size of the population in which the allele is selected, also affects the Pd of the ancestral allele. So when I set –n 1 0.01, variation is strongly reduced in the ancestral allele. This is also seen just by the length of the outputted sequences and a short coalescence time. Do you have any idea what could be the reason of this? We would be very grateful for any help.

[image: pd_sim]https://f.cloud.github.com/assets/6349171/1868198/121fe9e6-7864-11e3-9c46-960511a3f125.jpg

Best regards, Steven

— Reply to this email directly or view it on GitHubhttps://github.com/delt0r/msms/issues/32 .

I have no special talents. I am only passionately curious. --Albert Einstein

delt0r commented 10 years ago

I was hoping a new version i am working would make this all easier to deal with. However its talking a little long. Sorry. But you are not forgotten.

On Mon, Jan 13, 2014 at 12:21 PM, Greg Ewing greg.ewing@gmail.com wrote:

Have been away. Will look at this sometime this week. Sorry.

On Wed, Jan 8, 2014 at 1:56 PM, StevenVB12 notifications@github.comwrote:

Dear Dr. Ewing,

Please allow me the following question. I'm working on sequence variation of a gene involved in a dispersal adaptation. To better understand the evolutionary history of this locus, we are using your program MSMS. However, we have come across a result that we find difficult to explain and understand.

Basically, we assume two populations in which an allele evolves and is selected in opposing directions as follows:

./msms 10 1 -N 100000 -t 15 -I 2 5 5 -n 1 1 -ma x 100 100 x -Sc 0 1 1000 -500 -1000 -Sc 0 2 -1000 -500 1000 -Smu 0.01 -SI 1 2 0 0 -Smark (./msms 10 1 -N 100000 -t 15 -I 2 5 5 -n 1 0.01 -ma x 100 100 x -Sc 0 1 1000 -500 -1000 -Sc 0 2 -1000 -500 1000 -Smu 0.01 -SI 1 2 0 0 -Smark)

In this command we vary both the selection time (-SI) and subpopulation size (-n). I wrote a script to parse the ancestral and derived (selected) sequences and calculate a.o. pairwise differences (Pd) between the sequences. Some results are included in the attached graph. As expected, when the selection time increases, Pd in the derived (selected) allele increases, unless the subpopulation size of the population in which the derived allele is selected is reduced. However, what I do not understand is that a small population size of the population in which the allele is selected, also affects the Pd of the ancestral allele. So when I set -n 1 0.01, variation is strongly reduced in the ancestral allele. This is also seen just by the length of the outputted sequences and a short coalescence time. Do you have any idea what could be the reason of this? We would be very grateful for any help.

[image: pd_sim]https://f.cloud.github.com/assets/6349171/1868198/121fe9e6-7864-11e3-9c46-960511a3f125.jpg

Best regards, Steven

Reply to this email directly or view it on GitHubhttps://github.com/delt0r/msms/issues/32 .

I have no special talents. I am only passionately curious. --Albert Einstein

I have no special talents. I am only passionately curious. --Albert Einstein

delt0r commented 10 years ago

Sorry, I did indeed miss this for a while.

I don't see any issue here. I would expect tree height to be reduced and hence Pd to be reduced. Any pair of lineages in the smaller deme will coalesce much more rapidly. Even more so with selection.

Also note that some simulations may not even have any selected allele. While others my fix in the selected deme very close to the introduction of the allele.

Please reopen if i am missing the point.