gchen98 / macs

Automatically exported from code.google.com/p/macs
16 stars 6 forks source link

smaller variance of summary statistic in macs compared to that in ms data #18

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
- Assume the per base theta is 0.001 (Ne=1e4, mutation rate=2.5x10^-8 per 
generation), simulate 1000 sequences with 1e5 bases given a simple 
two-populations (20 samples for each population) split model with gene flow 
between them (same as listed in issue 17).

1. ms command: 
          seed: 641499925
          ms 40 1000 -t 100 -I 2 20 20 -n 1 1 -n 2 1 -ma x 2 2 x -ej 0.1 2 1 -r 0.0004 100000
2. MaCS command:
          macs 40 100000 -s 1371511838 -i 1000 -t 0.001 -I 2 20 20 -h 100 -n 1 1 -n 2 1 -ma x 2 2 x -ej 0.1 2 1 -r 0.0004

What is the expected output? What do you see instead?
- The variance of the number of segregating sites (S) over the 1000 loci that 
result from running ms and MaCS should be compatible, as it been shown in the 
supplementary information of the MaCS original paper. Yet it seems that MaCS 
produces a much smaller variance of S than ms does. In my case, the standard 
deviation for S in the MaCS data set is only half of that in the ms data set. 
The same pattern is also observed under a model without migration and/or with a 
variety of values for the -h switch. Please refer to the Fig2 attached in the 
issue 17 for an idea of the difference in variance.

What version of the product are you using? On what operating system?
- I tested the above MaCS commands in the last version (via svn repository) on 
a Mac OS X 10.6 system.

I wonder what makes MaCS produce much less variation. Please help identify 
things that I might have made/overlooked.

Thanks,
PingHsun(Benson)

Original issue reported on code.google.com by hsie...@email.arizona.edu on 19 Jul 2013 at 1:15

GoogleCodeExporter commented 9 years ago
Hello Gary,

I just noticed that I did not scale the recombination rate for the ms 
simulation correctly. The correct ms command should be "ms 40 1000 -t 100 -I 2 
20 20 -n 1 1 -n 2 1 -ma x 2 2 x -ej 0.1 2 1 -r 40 100000". This gave me about 
the same level of variance.

In addition to this, I wonder what the value for the switch -r I should use 
when the macs command incorporates a recombination map. How does macs relate 
the per base recombination rate to the rates specified in the recombination map?

Thanks,
PingHsun(Benson)

Original comment by hsie...@email.arizona.edu on 19 Jul 2013 at 11:33

GoogleCodeExporter commented 9 years ago
Hi Benson,

Thanks for following up on this.  Sorry I've just been busy.  For a 
recombination map you'll want to use the -R (case sensitive) switch.  The map 
just says how much to scale the recombination rate specified in -r for a 
specified genomic interval.  

So in the example:
0       0.3     0.570688491664533
0.3     0.5     1.45639998785967

from 0 to .3 of the interval per site recombination is .57 of the value in -r 

Original comment by gche...@gmail.com on 20 Jul 2013 at 12:44

GoogleCodeExporter commented 9 years ago
Thanks, Gary! Please help to close this issue for me.

PingHsun(Benson)

Original comment by hsie...@email.arizona.edu on 20 Jul 2013 at 12:49

GoogleCodeExporter commented 9 years ago

Original comment by gche...@gmail.com on 20 Jul 2013 at 12:51