gchen98 / macs

Automatically exported from code.google.com/p/macs
16 stars 6 forks source link

sequence length reported by Stephen Schiffels #3

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago

Hi Gary,

my name is Stephan Schiffels, I am a postdoc in Richard Durbin's lab at the 
Wellcome Trust Sanger Institute in Cambridge, UK.

I noticed a problem with the MaCS simulator. The lengths of all the output 
trees do not add up to the given length of the chromosome. Also, many 
segregating sites that are output between two trees are found to have a 
position that actually falls outside the given tree segment.

I attached a simple awk-script, that shows the error. You can check it with - 
for example - this command in a bash shell

M=3;L=1000000;mu=0.002732;rho=0.000452;
macs $M $L -t $mu -r $rho -T 2>/dev/null | awk -v L=$L -f checkMacs.awk

Related to these problems (and how I actually found it) is the fact that the 
density of mutations and the given local tMRCA decorrelate when you go further 
down the chromosome.

What I could imagine as a possible source of error is a discretization problem 
in the output lengths of the trees. So for example, is it possible, that you 
actually generate the tree lengths from an underlying continuous model, and 
then you discretize them by just converting numbers to ints? That would explain 
why the total lengths of all trees adds up to a number that is roughly the 
lenght of the chromosome minus half the number of trees. And it also explains 
why sites are more and more decorrelated with the actual trees the further you 
go down the chromosome (this is not shown in the script).

Maybe I am just doing something fundamentally wrong, in any case I would be 
very happy to get comments from you. MaCS has been an extremely useful program 
in our group, and the error I described doesn't change anything fundamental 
about the distribution of segregating sites. But it does play an important role 
once you do an analysis that connects positions of trees and sites.

Thanks a lot and best regards,

Stephan

Original issue reported on code.google.com by gche...@gmail.com on 13 Aug 2012 at 5:05

GoogleCodeExporter commented 9 years ago
I have found the problem, it is indeed what I suspected. Here is the fix I did:

in algorithm.cpp:

after line 1179, insert 
    unsigned int iLastCumulativePos = 0;

after line 1271 (of the new file), replace the line with:
            unsigned int iSegLength = curPos*pConfig->dSeqLength - iLastCumulativePos;
            iLastCumulativePos += iSegLength;

This fixes the problem I described.

I told some colleagues about this, let me know whether you plan to update the 
version on your website, so I can tell them.

Best,

Stephan

Original comment by gche...@gmail.com on 13 Aug 2012 at 7:45