Closed ashander closed 6 years ago
Merging #55 into master will increase coverage by
9.86%
. The diff coverage is94.05%
.
@@ Coverage Diff @@
## master #55 +/- ##
==========================================
+ Coverage 75.18% 85.04% +9.86%
==========================================
Files 4 4
Lines 270 301 +31
==========================================
+ Hits 203 256 +53
+ Misses 67 45 -22
Impacted Files | Coverage Δ | |
---|---|---|
ftprime/argrecorder.py | 87.3% <94.04%> (+13.71%) |
:arrow_up: |
ftprime/recomb_collector.py | 91.02% <94.11%> (+1.63%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update 4046c18...911e859. Read the comment docs.
I'm curious: why rename self.edges
-> self.__edges
but then make it so you access it with self.edges
?
with the private variable + property it is harder to overwrite https://stackoverflow.com/a/1641236/4598520
I think it's a step toward making sure we are never changing the original ref to a node table while preserving the same UI for accessing the nodes
On Sun, Nov 19, 2017 at 5:59 PM, Peter Ralph notifications@github.com wrote:
I'm curious: why rename self.edges -> self.__edges but then make it so you access it with self.edges?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ashander/ftprime/pull/55#issuecomment-345572136, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfLOJ3x_U419qVCNWOSEZ75FMp-A74Gks5s4N0RgaJpZM4QgEBS .
actually @petrelharp hold off on looking at this just yet. I think I've tracked down the issue.
OK @petrelharp I'm not sure what's going on. It fails the simplest simplify test with MSP_ERR_EDGES_NOT_SORTED_PARENT_TIME at the line https://github.com/ashander/ftprime/pull/55/files#diff-31f00c08541e8f1ed735675895069f40R425 with internal state given below
internal state trying to simplify [4, 5]
---------
Max time so far:
2.0
Last update time:
2.0
Node IDs:
{0: 1, 1: 2, 4: 3, 5: 4}
Nodes:
id flags population time
0 0 -1 3.00000000000000
1 0 -1 2.20000000000000
2 0 -1 2.00000000000000
3 1 2 0.00000000000000
4 1 2 0.00000000000000
Edges:
id left right parent child
0 0.00000000 1.00000000 0 2
1 0.00000000 0.50000000 1 3
2 0.50000000 1.00000000 1 3
Sites:
id position ancestral_state
Mutations:
id site node derived_state parent
Migrations:
Well, I don't get that error after putting those tables in edges.txt
and nodes.txt
and running
import msprime
ts = msprime.load_text(edges=open("edges.txt", "r"), nodes=open("nodes.txt", "r"))
ts.simplify(samples=[3,4])
... so, I think that must not actually be the internal state when trying to simplify?
aha. so I checked that the tables msprime is getting are the same memory as we're printing. but then I repeated what @petrelharp did here
Well, I don't get that error after putting those tables in edges.txt and nodes.txt and running
import msprime ts = msprime.load_text(edges=open("edges.txt", "r"), nodes=open("nodes.txt", "r")) ts.simplify(samples=[3,4])
... so, I think that must not actually be the internal state when trying to simplify?
and I noticed: the table is reversed after it's loaded! (Sure enough tables are sorted by load_text ). So I'm clearly not actually sorting the edges in the correct direction...
In [25]: cat edges.txt
id left right parent child
0 0.00000000 1.00000000 0 2
1 0.00000000 0.50000000 1 3
2 0.50000000 1.00000000 1 3
In [26]: ts = msprime.load_text(edges=open('edges.txt'), nodes=open('nodes.txt'))
In [27]: print(ts.dump_tables())
############################################################
# Nodes #
############################################################
id flags population time
0 0 -1 3.00000000000000
1 0 -1 2.20000000000000
2 0 -1 2.00000000000000
3 1 2 0.00000000000000
4 1 2 0.00000000000000
############################################################
# Edges #
############################################################
id left right parent child
0 0.00000000 0.50000000 1 3
1 0.50000000 1.00000000 1 3
2 0.00000000 1.00000000 0 2
############################################################
# Sites #
############################################################
id position ancestral_state
############################################################
# Mutations #
############################################################
id site node derived_state parent
############################################################
# Migrations #
############################################################
id left right node source dest time
So, time is not being renumbered correctly here
I wasn't sending all edges. Because of a zip
:o
Reverted because it is actually ~ 2x as slow as previous approach on real sized examples. I assume it's in the overhead of constructing zips etc to fill these numpy arrays. That suggests that something like Peter's suggestion https://github.com/ashander/ftprime/pull/55#discussion_r152090473 would work better
Summary
The new methods to add a bunch of edges at a time work fine in arg_recorder, but I haven't yet tried to use them in recomcollector.
BUT for simplify in chunks, no luck so far. I've tried to adopt the approach taken in Kevin's repo: https://github.com/molpopgen/fwdpy11_arg_example/pull/8/commits/fef0ba55c46683d3f6165fdd516b34a6dba76718
ARGrecorder
add_individuals
,add_records
and useappend_columns
(per #41)__call__
interfaceRecombCollector