gchen98 / macs

Automatically exported from code.google.com/p/macs
16 stars 6 forks source link

Disagreement between ms and macs when models involve migration #17

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

- Assume the per base theta is 0.001 (Ne=1e4, mutation rate=2.5x10^-8 per 
generation), simulate 1000 sequences with 1e5 bases given a simple 
two-populations (20 samples for each population) split model with/without gene 
flow between them.

1. A model with migration (, where the macs output does NOT agree with the one 
of ms)
    ms: 
          seed: 641499925
          ms 40 1000 -t 100 -I 2 20 20 -n 1 1 -n 2 1 -ma x 2 2 x -ej 0.1 2 1 -r 0.0004 100000
    MaCS:
          macs 40 100000 -s 1371511838 -i 1000 -t 0.001 -I 2 20 20 -h 100 -n 1 1 -n 2 1 -ma x 2 2 x -ej 0.1 2 1 -r 0.0004

2. A model without migration (, where the macs output does agree with the one 
of ms)
    ms: 
          seed:993080870
          ms 40 1000 -t 100 -I 2 20 20 -n 1 1 -n 2 1 -ma x 0 0 x -ej 0.1 2 1 -r 0.0004 100000
    MaCS:
          macs 40 100000 -s 1371512336 -i 1000 -t 0.001 -I 2 20 20 -h 100 -n 1 1 -n 2 1 -ma x 0 0 x -ej 0.1 2 1 -r 0.0004

What is the expected output? What do you see instead?
- The expected theta and/or the distributions of the number of segregating 
sites over the 1000 loci that result from running ms and MaCS should be 
compatible. Yet it seems that MaCS produces much more segregating sites than ms 
does when gene flow is present in the model of interest. The uploaded boxplots 
illustrate the difference (Fig1: with migration; Fig2: without migration).

What version of the product are you using? On what operating system?
- I tested the above MaCS commands in the version 0.5c on a Mac OS X 10.6 
system.

Please provide any additional information below.
- I also gave the same demographic model (with symmetric gene flow between the 
two populations) to the forward simulator, DaDi, to estimate the theta of the 
simulated data sets produced by ms and MaCS as its inputs, separately. The 
averaged estimated theta of the ms data set is ~100, which is expected given 
the above command; however, the averaged estimated theta of the MaCS data set 
is >150, which is 50% more than the expected value. Thus this reproduced the 
difference observed between the two plots I uploaded. I also tested a 
two-populations model with asymmetric migration between them using both ms and 
MaCS, and MaCS also produced much more mutations than ms did. 

I wonder what makes MaCS produce much more mutations. Please help to identify 
if any mistakes I might have made/overlooked.

Thanks,
PingHsun(Benson)

Original issue reported on code.google.com by hsie...@email.arizona.edu on 18 Jun 2013 at 1:38

Attachments:

GoogleCodeExporter commented 9 years ago
Hello Gary,

After doing many tedious tests, I believe that I have identified the cause of 
the elevating segregating sites observed in MaCS simulated data sets. It seems 
that the cause of what I observed matches to an old bug of ms (for ms prior to 
May 2007, Page 11 in the last ms manual) - that is, when a model is set to have 
gene flow, ms produced more segregating sites if some of the migration rates 
were not set to zero after a population merge event (backward in time) -ej took 
place. This then makes additional foreign lineages move into the other 
populations from a 'ghost' population, which is similar to increasing mutation 
rate.

To make MaCS work properly, as in the command below, I manually set the 
migration rate to zero after the -ej event, and the resulting data set has the 
right amount of mutations, which is compatible to that of ms'.

macs 40 100000 -s 1371511838 -i 1000 -t 0.001 -I 2 20 20 -h 100 -n 1 1 -n 2 1 
-ma x 2 2 x -ej 0.1 2 1 -ema 0.100001 1 0 -r 0.0004

I also tried models with more merge events, and found that as long as the 
migration rate is set to be zero after the last merge event, the amount of 
mutations generated by MaCS looks good. Thus I suspect what I observed might be 
caused by the non-zero migration rate after the last merge event. That being 
said, I hope this piece of information can help you to identify and fix the 
possible cause.

Best,
PingHsun(Benson)

Original comment by pinghsun...@gmail.com on 22 Jun 2013 at 1:03

GoogleCodeExporter commented 9 years ago
Thanks PingHsun,

I'm sorry I've been busy.  Thanks for your investigation.  I'll see what I can 
do once I have some time.

Gary

Original comment by gche...@gmail.com on 22 Jun 2013 at 3:47

GoogleCodeExporter commented 9 years ago
Hi PingHsun,

Can you try checking out the latest revision using SVN?  I have updated code 
and so far I do see ~5-600 sites which should be more compatible with your box 
plot for ms.

Thanks

Gary

Original comment by gche...@gmail.com on 23 Jun 2013 at 10:11

GoogleCodeExporter commented 9 years ago
Hi Gary,

The latest version of MaCS worked great for the case I provided earlier!

However, when I tested a model involving changing migration rates between 
populations using the -ema switch to assign new rates for all populations, MaCS 
returned segfaults; interestingly, in the case of assigning rates using the -eM 
switch, MaCS produced the desired output. The same codes (both -ema and -eM) 
using ms produce the desired output as well.

Here are the commands I used to assign new migration rates.

- codes that cause MaCS crashed
 macs 60 100000 -s 1372028638 -i 100 -h 100 -I 3 20 20 20 -t 0.001 -n 1 2 -n 2 1 -n 3 1 -ma x 2 2 2 x 0 2 0 x -ej 0.005 3 2 -ema 0.1 2 x 1 1 x -ej 0.2 2 1 -en 0.200001 1 1 -r 0.0004

- codes that MaCS worked properly
 macs 60 100000 -s 1372028638 -i 100 -h 100 -I 3 20 20 20 -t 0.001 -n 1 2 -n 2 1 -n 3 1 -ma x 2 2 2 x 0 2 0 x -ej 0.005 3 2 -eM 0.1 1 -ej 0.2 2 1 -en 0.200001 1 1 -r 0.0004

I hope this is useful if you have not yet noticed these cases.

Thanks,
PingHsun(Benson)

Original comment by pinghsun...@gmail.com on 23 Jun 2013 at 11:05

GoogleCodeExporter commented 9 years ago
Hi PingHsun,

The reason this is segfaulting is because as macs traverses up the arg from the 
current time backwards, it grows the migration matrix and the other data 
structures are new populations are encountered (e.g. pop split).  But when pops 
are joined, these data structures are not shrunk.  We just set the number of 
existing chromosomes to zero and migration rates to zero for defunct 
populations.  Hence, if you could change your statement

macs 60 100000 -s 1372028638 -i 100 -h 100 -I 3 20 20 20 -t 0.001 -n 1 2 -n 2 1 
-n 3 1 -ma x 2 2 2 x 0 2 0 x -ej 0.005 3 2 -ema 0.1 2 x 1 1 x -ej 0.2 2 1 -en 
0.200001 1 1 -r 0.0004

to

macs 60 100000 -s 1372028638 -i 100 -h 100 -I 3 20 20 20 -t 0.001 -n 1 2 -n 2 1 
-n 3 1 -ma x 2 2 2 x 0 2 0 x -ej 0.005 3 2 -ema 0.1 3 x 1 1 1 x 1 1 1 x -ej 0.2 
2 1 -en 0.200001 1 1 -r 0.0004

the program should work.

Original comment by gche...@gmail.com on 23 Jun 2013 at 11:25

GoogleCodeExporter commented 9 years ago

Thanks, Gary for pointing out this. Just to be clear, in the modified code you 
provided, you also assigned new migration rates for population 3, which no 
longer exists, after the merge event between the population 2 and population 3. 
Does this mean that, as you stated, because MaCS now sets the number of 
existing chromosomes to zero for population 3, this new migration array here 
won't have the same 'ghost population' effect as we observed previously (as no 
lineages can be moved)?

Thanks for all of your helps!
PingHsun(Benson)

Original comment by pinghsun...@gmail.com on 23 Jun 2013 at 11:43

GoogleCodeExporter commented 9 years ago
That is correct.  At the join event, the migration matrix is adjusted so that 
nothing can flow out of the ghost population (appropriate columns), and nothing 
can flow into it (the row of the matrix) by zeroing out entries.

Original comment by gche...@gmail.com on 24 Jun 2013 at 12:40

GoogleCodeExporter commented 9 years ago
By the way, I just checked in a new revision just now that throws an 
informative error message if you misspecify the matrix size in the event 
instead of segfaulting.  Please close this issue if you are satisfied with the 
resolution.  Thanks.

Original comment by gche...@gmail.com on 24 Jun 2013 at 1:12

GoogleCodeExporter commented 9 years ago
Hi Gary, 

Thank you again for taking care of this issue. I am glad that everything has 
been sorted out well.
As I am not sure how to close this thread, will you please do so for me?

Thanks, 
PingHsun(Benson)

Original comment by hsie...@email.arizona.edu on 24 Jun 2013 at 2:42

GoogleCodeExporter commented 9 years ago

Original comment by gche...@gmail.com on 24 Jun 2013 at 2:48