OpenTreeOfLife / treemachine

Source tree graph database
Other
16 stars 6 forks source link

edges corresponding to compatible phylogenetic statements considered conflict #159

Closed mtholder closed 8 years ago

mtholder commented 9 years ago

As feared, the simplistic fix (in commit https://github.com/OpenTreeOfLife/treemachine/commit/54fcad1e2bd72d20986f586cbab14d8e0483216 ) which fixes https://github.com/OpenTreeOfLife/treemachine/issues/158 and https://github.com/OpenTreeOfLife/treemachine/issues/157 does not alleviate issues with compatible groupings.

The test that is failing now is nontrivialaugmenting on https://github.com/OpenTreeOfLife/treemachine/tree/avoid-trivial-conf

When given a choice between node 9 (from tree1 but only containing 3 leaves) and node 13 (from tree2, but compatible with node9 and having another leaf), node9 is chosen. It is not a trivial edge, so my simplistic solution does not help.

Looks like a more elaborate search for edges that "dominate" a high ranking edge may be in order, but I suspect that this is just retracing the branch-and-bound ground that the Smith lab has explored.

I also noted that the mrca property for node 10 in this example is [3,3,4,4,5,7,8] which seems odd. I thought that property was a set (expressed as an array in Java in our impl. but not allowing duplication. I don't think that is related to this test failure though.

blackrim commented 9 years ago

Joseph and I spoke about this this morning and I am going to try a branch and bound like thing that may alleviate the compatible groupings bit.

As for the set that is interesting. That can't be causing the problem but it is wasting space (and should be not happening because of the set bit). Will check on where a set needs to be added to save some space.

On Tue, Feb 3, 2015 at 9:31 AM, Mark T. Holder notifications@github.com wrote:

As feared, the simplistic fix (in commit 54fcad1 https://github.com/OpenTreeOfLife/treemachine/commit/54fcad1e2bd72d20986f586cbab14d8e0483216 ) which fixes #158 https://github.com/OpenTreeOfLife/treemachine/issues/158 and #157 https://github.com/OpenTreeOfLife/treemachine/issues/157 does not alleviate issues with compatible groupings.

The test that is failing now is nontrivialaugmenting on https://github.com/OpenTreeOfLife/treemachine/tree/avoid-trivial-conf

When given a choice between node 9 (from tree1 but only containing 3 leaves) and node 13 (from tree2, but compatible with node9 and having another leaf), node9 is chosen. It is not a trivial edge, so my simplistic solution does not help.

Looks like a more elaborate search for edges that "dominate" a high ranking edge may be in order, but I suspect that this is just retracing the branch-and-bound ground that the Smith lab has explored.

I also noted that the mrca property for node 10 in this example is [3,3,4,4,5,7,8] which seems odd. I thought that property was a set (expressed as an array in Java in our impl. but not allowing duplication. I don't think that is related to this test failure though.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159.

mtholder commented 9 years ago

I should have pointed out that https://github.com/OpenTreeOfLife/treemachine/blob/avoid-trivial-conf/test-synth/nontrivialaugmenting/README.md describes the test.

The synthesis is returning: (F,D,(C,(B,(E,A))))

mtholder commented 9 years ago

I'm now thinking that viable approach might be:

incoming_edges = find_incoming_TAG_edges(node)
incoming_edges.sort(criterion=highest_ranked_trees_first)
pending_or_accepted = []
for edge in incoming_edges:
    if count_descendant_tips(edge) > 1:
        conf_edges = find_conflicting_edges(edge, pending_or_accepted)
        nc = len(conf_edges)
        if nc == 0:
            pending_or_accepted.append(edge)
        if nc == 1:
            ce = conf_edges[0]
            if subsumed(edge, ce):
                pending_or_accepted.remove(ce)
                pending_or_accepted.append(edge)
            else:
                log_rejection_of_edge(edge)
          else:
               log_rejection_of_edge(edge)
# now add the trivial edges (if they didn't come in from a lower ranked tree)
for edge in incoming_edges:
    if count_descendant_tips(edge) == 1:
          if not conflicts_with_accepted(edge, accepted):
               pending_or_accepted.append(edge)

where subsumes(edge_one, edge_two) is true if edge_one has all of the taxa that edge_two has. And edge_one lacks any other taxa from the tree that contributed edge_two. Basically, if there are 2 compatible edges, we'd like to traverse the one that has more taxa.

I'm hoping that the TAG alignment means that the nodeX will be encountered later. Where nodeX is the source of the high priority edge which is removed because it is subsumed.

blackrim commented 9 years ago

OK, yeah, this seems along the lines of what Joseph and I talked about this morning. Are you going to try for an implementation? I have a little more class prep to do today and then have some time to do things if you haven't already.

On Tue, Feb 3, 2015 at 11:32 AM, Mark T. Holder notifications@github.com wrote:

I'm now thinking that viable approach might be:

incoming_edges = find_incoming_TAG_edges(node) incoming_edges.sort(criterion=highest_ranked_trees_first) pending_or_accepted = [] for edge in incoming_edges: if count_descendant_tips(edge) > 1: conf_edges = find_conflicting_edges(edge, pending_or_accepted) nc = len(conf_edges) if nc == 0: pending_or_accepted.append(edge) if nc == 1: ce = conf_edges[0] if subsumed(edge, ce): pending_or_accepted.remove(ce) pending_or_accepted.append(edge) else: log_rejection_of_edge(edge) else: log_rejection_of_edge(edge)

now add the trivial edges (if they didn't come in from a lower ranked tree)

for edge in incoming_edges: if count_descendant_tips(edge) == 1: if not conflicts_with_accepted(edge, accepted): pending_or_accepted.append(edge)

where subsumes(edge_one, edge_two) is true if edge_one has all of the taxa that edge_two has. And edge_one lacks any other taxa from the tree that contributed edge_two. Basically, if there are 2 compatible edges, we'd like to traverse the one that has more taxa.

I'm hoping that the TAG alignment means that the nodeX will be encountered later. Where nodeX is the source of the high priority edge which is removed because it is subsumed.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-72683157 .

mtholder commented 9 years ago

I have just barely started - but I'm really slow with neo4j stuff. I think that it makes sense for you to guys to push ahead and for me to try. duplicated effort is not a major concern, imho.

blackrim commented 9 years ago

sounds good. will do

On Tue, Feb 3, 2015 at 12:36 PM, Mark T. Holder notifications@github.com wrote:

I have just barely started - but I'm really slow with neo4j stuff. I think that it makes sense for you to guys to push ahead and for me to try. duplicated effort is not a major concern, imho.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-72696369 .

mtholder commented 9 years ago

I think that some of the changes I made earlier today are responsible for the dups in the mrca array.

mtholder commented 9 years ago

I think that some of the changes I made earlier today are responsible for the dups in the mrca array. I'm about to call it an evening. I need to clean up the duplication, and I don't have a fix for this issue yet. But I think I'm fairly close to having one - not terribly pretty or efficient. but we might be able to turn it in to something usable. I'll push it as mth-broken in case someone wants to look at it.

blackrim commented 9 years ago

That would be great. I am going to be taking a look in about 30 min when I finish up something here.

On Tue, Feb 3, 2015 at 4:25 PM, Mark T. Holder notifications@github.com wrote:

I think that some of the changes I made earlier today are responsible for the dups in the mrca array. I'm about to call it an evening. I need to clean up the duplication, and I don't have a fix for this issue yet. But I think I'm fairly close to having one - not terribly pretty or efficient. but we might be able to turn it in to something usable. I'll push it as mth-broken in case someone wants to look at it.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-72738084 .

mtholder commented 9 years ago

pushed. the commit message refers to some clean up from subsumed messages but I forgot to mention where. That would be in RankOrSubsumedResolutionMethod.java method addNonConflictingOrSubsumed which is called from resolveConflicts.

blackrim commented 9 years ago

OK, will check out this one and see what I can make of it and run some tests. thanks!

On Tue, Feb 3, 2015 at 4:30 PM, Mark T. Holder notifications@github.com wrote:

pushed. the commit message refers to some clean up from subsumed messages but I forgot to mention where. That would be in RankOrSubsumedResolutionMethod.java method addNonConflictingOrSubsumed which is called from resolveConflicts.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-72739004 .

blackrim commented 9 years ago

Familiarizing myself with the code you wrote. Some of the vocabulary is tripping me up a bit (trivial and nontrivial...) but I added some System.outs to help me see where the test was failing. In commit 02bf29373f337506f2b78e3242570af2e7009991 I changed the if (tem.size() > 0) to if (tem.size() >= 0) in expander of SynthesisExpander. That fixes the test with the others still working. However, it would seem that that would just ignore tem. I am still going through it to determine what that impacts but determined this was the offending line when seeing that the necessary relationship was not being returned at this point (rel 41). I added a bunch of sys.outs that you can ignore/comment, etc.

Anyway, going to work some more tests and will push

On Tue, Feb 3, 2015 at 4:33 PM, Stephen Smith blackrim@gmail.com wrote:

OK, will check out this one and see what I can make of it and run some tests. thanks!

On Tue, Feb 3, 2015 at 4:30 PM, Mark T. Holder notifications@github.com wrote:

pushed. the commit message refers to some clean up from subsumed messages but I forgot to mention where. That would be in RankOrSubsumedResolutionMethod.java method addNonConflictingOrSubsumed which is called from resolveConflicts.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-72739004 .

mtholder commented 9 years ago

I just pushed a version that reverts the tem.size() > 0 but culls the duplicate descendants at an earlier stage. So the test passes again. Now when run with asserts enabled it will crash if there are duplications of a taxon node from multiple relationships - rather than pruning the duplicates. The 3 tests in the test-synth dir are passing.

chinchliff commented 9 years ago

This issue--or at least the associated 'nontrivialaugmenting' test--is now passing (that is, when the synthesis maximizing criterion is more resolved rels/more nodes, not tree rankings). Here is the resulting tag and the test output. Please confirm and close if appropriate.

FAILING_TREEMACHINE_TEST=nontrivialaugmenting ./run_synth_tests.sh

# [snipped]

(((((A,E),B),F),C),D);
clade extracted: A, B, C, E, F
clade extracted: A, B, C, D, E, F
clade extracted: A, B, E, F
clade extracted: A, B, E
clade extracted: A, E
((C_ottC,((B_ottB,(E_ottE,A_ottA)),F_ottF)),D_ottD)life_ott805080;
clade extracted: A, B, C, E, F
clade extracted: A, B, C, D, E, F
clade extracted: A, B, E, F
clade extracted: A, B, E
clade extracted: A, E
SUCCESS! recovered expected tree.
Failed 0 out of 1 tests(s).

image

kcranston commented 9 years ago

Is it just me, or does that code snippet not indicate failure? Or is it that some of the tests in run_synth_tests.sh are failing, but just not nontrivialaugmenting?

blackrim commented 9 years ago

Why does it indicate failure? "

SUCCESS! recovered expected tree. Failed 0 out of 1 tests(s). "

On Mon, Feb 23, 2015 at 10:08 AM, Karen Cranston notifications@github.com wrote:

Is it just me, or does that code snippet not indicate failure? Or is it that some of the tests in run_synth_tests.sh are failing, but just not nontrivialaugmenting?

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-75558611 .

kcranston commented 9 years ago

Ah, misread that. Thanks.

chinchliff commented 9 years ago

I've only been running individual tests. Hence "failed 0 out of 1 tests". There are plenty of tests that fail, some for silly reasons like we haven't told them what to expect, and others because we are still working out other problems. I've been adding tests as well, for new issues as they come up. As far as I can tell, the initial concerns that gave rise these particular github issues that I've posted test output to are addressed by the new methods. That's why I've been cherry picking the tests.

On Monday, February 23, 2015, Karen Cranston notifications@github.com wrote:

Is it just me, or does that code snippet not indicate failure? Or is it that some of the tests in run_synth_tests.sh are failing, but just not nontrivialaugmenting?

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/159#issuecomment-75558611 .