Closed ashander closed 6 years ago
Merging #44 into master will decrease coverage by
0.81%
. The diff coverage is78.94%
.
@@ Coverage Diff @@
## master #44 +/- ##
=========================================
- Coverage 82.12% 81.3% -0.82%
=========================================
Files 3 3
Lines 207 214 +7
=========================================
+ Hits 170 174 +4
- Misses 37 40 +3
Impacted Files | Coverage Δ | |
---|---|---|
ftprime/argrecorder.py | 75.79% <78.94%> (-0.88%) |
:arrow_down: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4088274...31f8879. Read the comment docs.
This looks like it passes but there is one skipped test which is to see if we get the same sequence from tree_sequence
and simplify
followed by tree_sequence
.
We aren't getting the same due to mismatching sequence length --- when using only ArgRecorder.tree_sequence
the sequence length is not pruned to the actual length spanned by the edges. As can be seen in the code the method that does not prune runs only msprime.TreeSequence.simplify()
while the one that does uses msprime.simplify_tables()
This is possibly an msprime
bug, or we should change our expectations
In this case we simplify this (using the old notation) tree sequence and simplify choosing samples 3,4
. (Note: something seems wrong about the simplify here -- it should include 3 on (0.5, 1.0) but ignore that for now as the msprime issue linked below shows that TreeSequence.simplify is not reducing sequence_length)
|| Nodes:
|| id flags population time
|| 0 0 -1 1.00000000000000
|| 1 1 -1 0.20000000000000
|| 2 1 -1 0.00000000000000
|| 3 1 2 2.00000000000000
|| 4 1 2 2.00000000000000
|| Edgesets:
|| id left right parent children
|| 0 0.00000000 1.00000000 0 1,2
|| 1 0.00000000 0.50000000 1 3,4
|| 2 0.50000000 1.00000000 1 3
to the following two sequences
(produced directly from ArgRecorder.tree_sequence
)
|| ---------------- length sequence a ----------- || 0.5
|| id flags population time
|| 0 1 2 0.00000000000000
|| 1 1 2 0.00000000000000
|| 2 0 -1 2.20000000000000
|| id left right parent children
|| 0 0.00000000 0.50000000 2 0,1
produced from first ArgRecorder.simplify
then ArgRecorder.tree_sequence
|| id flags population time
|| 0 1 2 0.00000000000000
|| 1 1 2 0.00000000000000
|| 2 0 -1 2.20000000000000
|| id left right parent children
|| 0 0.00000000 0.50000000 2 0,1
using earlier msprime code (jeromekelleher/msprime.git@63107256347f6a365309cadd6687640d11bb99db ) the sequence lenghts were both changed to 0.5.
I think it's now failing because recent changes (in https://github.com/jeromekelleher/msprime/commit/ff58e4658f725d19981c26a940a5ebb0f16f1c46 ) to the way the TreeSequence.sequence_length
attribute is treated during simplify but have not parsed the code through simplify_tables
vs TreeSequence.simplify()
to understand the difference.
This is correct behavior, because if the sequence_length
is not passed in, then it is determined to be the max right
coordinate of the edges. For the first tree sequence:
tsa = records.tree_sequence([4, 5])
what happens is that the entire tree sequence gets simplified; there are edges with right coordinate 1.0, so this is the sequence_length
. For the second:
records.simplify([4, 5])
tsb = records.tree_sequence([4, 5])
the first round of simplify
leaves only the single edge on [0.0, 0.5)
, so the second simplify that happens in .tree_sequence([4,5])
reduces the sequence length to this.
I think that the ARGrecorder should have a sequence_length
attribute: it can be optional, and determined from input if not provided; then this can be passed to simplify_tables
. That would eliminate this.
OK, I added it in, possibly correctly (but can't test on this computer...)
OK. I was missing this
if the sequence_length is not passed in, then it is determined to be the max right coordinate of the edges.
which happens somewhere in msprime code other than where I was looking.
But I still find this inconsistent seq length under simplify_tables vs TreeSequence.simplify (shown in https://github.com/jeromekelleher/msprime/issues/283 ) confusing.
Abstractly, your fix makes sense. The implementation has some issues: after fixing some obvious bugs, it's not working in 5dac744 But I may be missing something else, as one of the bugs I fixed was passing sequence_length
to TreeSequence.simplify()
, which doesn't work in current msprime. Let me know and/or I'll stare at it more tomorrow
Still confused that the entire tree seq simplified in
For the first tree sequence:
tsa = records.tree_sequence([4, 5])
what happens is that the entire tree sequence gets simplified; there are edges with right coordinate 1.0, so this is the
sequence_length
.
when we explicitly pass samples. However, this relates to general confusion about TreeSequence.simplify. See https://github.com/jeromekelleher/msprime/issues/283#issuecomment-337989900
OK! Made it work. Had to set it so providing ts or edges overrides provided seq len. Thoughts?
I think that sequence_length
if passed in explicitly should override ts.sequence_length
. And, I don't see why this fixed the problem - do you? By my reading, the only difference is that I had
if sequence_length is not None:
self.sequence_length = sequence_length
elif ts is not None:
self.sequence_length = ts.sequence_length
while you have
if ts is not None:
self.sequence_length = ts.sequence_length
elif sequence_length is not None:
self.sequence_length = sequence_length
and where we use this it should be called with ts
not None
and sequence_length=None
, so these should be identical?
When I make that change (deleting the extra and (ts is not None)
phrase), it still passes tests on my computer...
Also, sorry for the non-functional fixes...
Re: your residual confusion above
Still confused that the entire tree seq simplified in ... what I meant is that for
tsa
, the set of tables passed to simplify is everything specified (init_ts
plus those edges added in this test), whereas to createtsb
, since we first calledrecords.simplify([4,5])
, much of this has been dropped from the internal tables, including those edges going out to 1.0.
Re the residual confusion, I got it, mostly. It was mostly due to confusion/surprise at how msprime behaves. See https://github.com/jeromekelleher/msprime/issues/283#issuecomment-337994675
The key fix I made was adding sequence_length to load_tables and removing it from ts.simplify().
Various versions of the flow control stuff at the start should work, depending on how we want things to behave. I opened #45 to discuss exactly what we want to happen
To work with current msprime master.
Todo, maybe