ashander commented 6 years ago

Summary

The new methods to add a bunch of edges at a time work fine in arg_recorder, but I haven't yet tried to use them in recomcollector.

BUT for simplify in chunks, no luck so far. I've tried to adopt the approach taken in Kevin's repo: https://github.com/molpopgen/fwdpy11_arg_example/pull/8/commits/fef0ba55c46683d3f6165fdd516b34a6dba76718

ARGrecorder

[x] interfaces that take iterators: add_individuals, add_records and use append_columns (per #41)
[x] tests that use new 'vectorized' interfaces
[x] __call__ interface
[ ] ~update simplify to use chunkwise sort ( as in https://github.com/molpopgen/fwdpy11_arg_example/pull/8/commits/fef0ba55c46683d3f6165fdd516b34a6dba76718 )~ this doesn't work with overlapping gens

RecombCollector

[x] construct iterators and use new interfaces

codecov[bot] commented 6 years ago

Codecov Report

Merging #55 into master will increase coverage by 9.86%. The diff coverage is 94.05%.

@@            Coverage Diff             @@
##           master      #55      +/-   ##
==========================================
+ Coverage   75.18%   85.04%   +9.86%     
==========================================
  Files           4        4              
  Lines         270      301      +31     
==========================================
+ Hits          203      256      +53     
+ Misses         67       45      -22

Impacted Files	Coverage Δ
ftprime/argrecorder.py	`87.3% <94.04%> (+13.71%)`	:arrow_up:
ftprime/recomb_collector.py	`91.02% <94.11%> (+1.63%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 4046c18...911e859. Read the comment docs.

petrelharp commented 6 years ago

I'm curious: why rename self.edges -> self.__edges but then make it so you access it with self.edges?

ashander commented 6 years ago

with the private variable + property it is harder to overwrite https://stackoverflow.com/a/1641236/4598520

I think it's a step toward making sure we are never changing the original ref to a node table while preserving the same UI for accessing the nodes

On Sun, Nov 19, 2017 at 5:59 PM, Peter Ralph notifications@github.com wrote:

I'm curious: why rename self.edges -> self.__edges but then make it so you access it with self.edges?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ashander/ftprime/pull/55#issuecomment-345572136, or mute the thread https://github.com/notifications/unsubscribe-auth/AAfLOJ3x_U419qVCNWOSEZ75FMp-A74Gks5s4N0RgaJpZM4QgEBS .

ashander commented 6 years ago

actually @petrelharp hold off on looking at this just yet. I think I've tracked down the issue.

ashander commented 6 years ago

OK @petrelharp I'm not sure what's going on. It fails the simplest simplify test with MSP_ERR_EDGES_NOT_SORTED_PARENT_TIME at the line https://github.com/ashander/ftprime/pull/55/files#diff-31f00c08541e8f1ed735675895069f40R425 with internal state given below

internal state trying to simplify [4, 5]

---------
Max time so far:
2.0
Last update time:
2.0
Node IDs:
{0: 1, 1: 2, 4: 3, 5: 4}
Nodes:
id      flags   population      time
0       0       -1              3.00000000000000
1       0       -1              2.20000000000000
2       0       -1              2.00000000000000
3       1       2               0.00000000000000
4       1       2               0.00000000000000
Edges:
id      left            right           parent  child
0       0.00000000      1.00000000      0       2
1       0.00000000      0.50000000      1       3
2       0.50000000      1.00000000      1       3
Sites:
id      position        ancestral_state
Mutations:
id      site    node    derived_state   parent
Migrations:

petrelharp commented 6 years ago

Well, I don't get that error after putting those tables in edges.txt and nodes.txt and running

import msprime
ts = msprime.load_text(edges=open("edges.txt", "r"), nodes=open("nodes.txt", "r"))
ts.simplify(samples=[3,4])

... so, I think that must not actually be the internal state when trying to simplify?

ashander commented 6 years ago

aha. so I checked that the tables msprime is getting are the same memory as we're printing. but then I repeated what @petrelharp did here

Well, I don't get that error after putting those tables in edges.txt and nodes.txt and running
import msprime
ts = msprime.load_text(edges=open("edges.txt", "r"), nodes=open("nodes.txt", "r"))
ts.simplify(samples=[3,4])
... so, I think that must not actually be the internal state when trying to simplify?

and I noticed: the table is reversed after it's loaded! (Sure enough tables are sorted by load_text ). So I'm clearly not actually sorting the edges in the correct direction...

In [25]: cat edges.txt
id      left            right           parent  child
0       0.00000000      1.00000000      0       2
1       0.00000000      0.50000000      1       3
2       0.50000000      1.00000000      1       3

In [26]: ts = msprime.load_text(edges=open('edges.txt'), nodes=open('nodes.txt'))

In [27]: print(ts.dump_tables())
############################################################
#   Nodes                                                  #
############################################################
id      flags   population      time
0       0       -1              3.00000000000000
1       0       -1              2.20000000000000
2       0       -1              2.00000000000000
3       1       2               0.00000000000000
4       1       2               0.00000000000000
############################################################
#   Edges                                                  #
############################################################
id      left            right           parent  child
0       0.00000000      0.50000000      1       3
1       0.50000000      1.00000000      1       3
2       0.00000000      1.00000000      0       2
############################################################
#   Sites                                                  #
############################################################
id      position        ancestral_state
############################################################
#   Mutations                                              #
############################################################
id      site    node    derived_state   parent
############################################################
#   Migrations                                             #
############################################################
id      left    right   node    source  dest    time

ashander commented 6 years ago

So, time is not being renumbered correctly here

ashander commented 6 years ago

I wasn't sending all edges. Because of a zip :o

ashander commented 6 years ago

Reverted because it is actually ~ 2x as slow as previous approach on real sized examples. I assume it's in the overhead of constructing zips etc to fill these numpy arrays. That suggests that something like Peter's suggestion https://github.com/ashander/ftprime/pull/55#discussion_r152090473 would work better

ashander / ftprime

change to numpy arrays #55

Summary

ARGrecorder

RecombCollector

Codecov Report