Closed ashander closed 6 years ago
This is failing with my test (see diff) to see if number of trees are within 3 standard deviations of the sample mean number of trees produced in msprime. Where we fail, we have a larger number of trees than msprime.
std = math.sqrt(1 / (len(tree_counts) - 1) * sse)
assert ts.sample_size == mspts.sample_size
# TODO fix expected error
> assert abs(mean - ts.num_trees) < 3 * std
E assert 113.47999999999999 < (3 * 25.60953105404472)
E + where 113.47999999999999 = abs((134.52 - 248))
E + where 248 = <msprime.trees.TreeSequence object at 0x7f77f2ad0fd0>.num_trees
--
std = math.sqrt(1 / (len(tree_counts) - 1) * sse)
assert ts.sample_size == mspts.sample_size
# TODO fix expected error
> assert abs(mean - ts.num_trees) < 3 * std
E assert 156.26 < (3 * 39.79584770989928)
E + where 156.26 = abs((265.74 - 422))
E + where 422 = <msprime.trees.TreeSequence object at 0x7f77f2bdec50>.num_trees
@petrelharp thanks for earlier feedback . we still have this issue (see two examples above) where our number of trees exceeds that from the supposedly equivalent msprime sims. this is the opposite to what we'd expect if simuPOP were skipping recombinations
can you
Since recombs are produced by simuPOP; I'm not sure this should be a ftprime issue. If there is an issue here, either (a) simuPOP is not producing the number of recombs we expect, or (b) we are not properly recording them. I think we adequately test (b) elsewhere, so a test of this would just involve simuPOP. It could be done by running simuPOP with N diploid individuals for T generations, writing out recombinations to a file; then the number of recombinations in that file (which something like cut -f 4- recombs.txt | wc -w
) should be N * T * r * L
plus/minus 3 * sqrt(N * T * r * L)
, where r
is the recomb rate per locus and L
is the number of loci.
oh, right: ALSO, I am not sure that number of trees is a good check, because I forget if simplify merges adjacent identical edges.
Oh! I should check that before I use it any further
On Thu, Nov 9, 2017 at 9:15 AM, Peter Ralph notifications@github.com wrote:
oh, right: ALSO, I am not sure that number of trees is a good check, because I forget if simplify merges adjacent identical edges.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ashander/ftprime/pull/54#issuecomment-343225288, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfLOGSsvCY3ndMRVHfgOkQbUg0akjyIks5s0zMkgaJpZM4QXDqG .
and good point about this not being inherent to ftprime.recomb_collector. though I'd argue it's nice to have examples where the resulting treeseq matches our input recombination rate! (as you point out, num_trees may not be a good test of this).
It could be done by running simuPOP with N diploid individuals for T generations, writing out recombinations to a file; then the number of recombinations in that file (which something like cut -f 4- recombs.txt | wc -w) should be N T r L plus/minus 3 sqrt(N T r * L), where r is the recomb rate per locus and L is the number of loci.
but I'll do this
just for now testing the number of trees produced vs supposedly equivalent msprime simulation
fixing this or getting the scaling right should resolve (some) recombination issues in https://github.com/petrelharp/ftprime_ms/pull/11