ashander / ftprime

Forward-time simulation of the msprime data structure (for development)
2 stars 1 forks source link

do example of retaining ancestral state #59

Open petrelharp opened 6 years ago

petrelharp commented 6 years ago

Sometimes (eg for the MK test) it'd be good to retain substitutions. To do this, we'd need to establish a root, mark it as a sample, and keep it in through simplification. I'm recording how to do this here; perhaps it should be a vignette somewhere.

ts = msprime.simulate(5,recombination_rate=2.0)
e = msprime.EdgeTable()
n = msprime.NodeTable()
ts.dump_tables(edges=e, nodes=n)
root_time = max(n.time) + 1.0
current_root = n.num_rows
n.add_row(flags=1, population=0, time=root_time)
for t in ts.trees():
  for r in t.roots:
    e.add_row(left=t.interval[0], right=t.interval[1], parent=current_root, child=r)

msprime.sort_tables(edges=e, nodes=n)
msprime.simplify_tables(samples=ts.samples() + [current_root], edges=e, nodes=n)
rooted_ts = msprime.load_tables(edges=e, nodes=n)
current_root = ts.num_samples
print("These should all be", current_root)
for t in rooted_ts.trees():
  print(t.roots)

... now, we append to the tables, and next time we simplify, we just need to add [current_root] to the list of samples. And repeat.

This doesn't deal with mutations, of course. If we add mutations after adding the root, then all is good. If we have already got mutations on the initial tree sequence, then we'd need to add mutations to the new edges we've just added that go back to that root.

petrelharp commented 6 years ago

this was from a question by @molpopgen; tagging @jeromekelleher for future thoughts - not sure if this is the right repo to put this in. =)

jeromekelleher commented 6 years ago

This looks good @petrelharp, and the code chunk is clear. I don't understand the motivation though --- what do you mean by retaining substitutions, and how does forcing a grand MRCA help you with this?

molpopgen commented 6 years ago

Some questions require tracking fixations, which occur prior to the MRCA of each marginal tree. Keeping the root as a sample will retain that extra branch from MRCA to the initial root.

On Mon, Jan 8, 2018, 1:21 AM Jerome Kelleher notifications@github.com wrote:

This looks good @petrelharp https://github.com/petrelharp, and the code chunk is clear. I don't understand the motivation though --- what do you mean by retaining substitutions, and how does forcing a grand MRCA help you with this?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/ashander/ftprime/issues/59#issuecomment-355915751, or mute the thread https://github.com/notifications/unsubscribe-auth/AGHnH1zHUDKEToFyC5Rxv2yPjtsLKuUCks5tId3PgaJpZM4RVNp2 .

--

Kevin Thornton

Associate Professor

Ecology and Evolutionary Biology

UC Irvine

http://www.molpopgen.org

http://github.com/ThorntonLab

http://github.com/molpopgen

jeromekelleher commented 6 years ago

Some questions require tracking fixations, which occur prior to the MRCA of each marginal tree. Keeping the root as a sample will retain that extra branch from MRCA to the initial root.

I see... So the goal is to retain information about when the fixation happened (plus other information?) as well as keeping the site (even though it's fixed?). If all we want to do is keep the site, then we can do this by just adding the filter_zero_mutation_sites=False flag to simplify. Perhaps we could then add some metadata to the site recording when fixation occured, and whatever else that is of interest (I'm assuming the forward time simulator will be aware of this)?

I think I'm missing something though...

molpopgen commented 6 years ago

If all we want to do is keep the site, then we can do this by just adding the filter_zero_mutation_sites=False flag to simplify.

Wouldn't this result in retaining both fixed and extinct variants?

petrelharp commented 6 years ago

The motivation was to be able to do a MK test, which involves the number of substitutions. There could be a special flag for this, or way of recording it, but I think just keeping the root around as a "sample" would be a fairly efficient way to do it: wouldn't the only downside be an increasing number of rows in the Site and Mutation tables that need to get passed through simplify? Non-optimal, but totally do-able.

I don' tthink this is urgent, btw; it just came up and I wrote down how to do it.

jeromekelleher commented 6 years ago

Wouldn't this result in retaining both fixed and extinct variants?

Yes, yes it would...

I don' tthink this is urgent, btw; it just came up and I wrote down how to do it.

OK, cool. I think I'm just being slow. The code looks good to me anyway.

molpopgen commented 6 years ago

Yes, yes it would...

"The new algorithm efficiently fills all available RAM in linear time."