OpenTreeOfLife / treemachine

Source tree graph database
Other
16 stars 6 forks source link

have ability to add non-taxon node labels to newicks #205

Open josephwb opened 8 years ago

josephwb commented 8 years ago

Currently only taxon labels are applied to newick nodes. e.g.

"newick" : "(Gavia_stellata_ott1057044,((Gavia_arctica_ott1085739,Gavia_pacifica_ott651474),(Gavia_immer_ott1057518,Gavia_adamsii_ott90560)))Gavia_ott803675;"
josephwb commented 8 years ago

I guess I should wait for @jar398 to give me the go ahead on this, since people have expressed that they want this, but none of the myriad docs explicitly discusses it.

jar398 commented 8 years ago

I think it's a good idea

-- apologies for brevity / using handheld gizmo --

On Feb 16, 2016, at 20:41, "Joseph W. Brown" notifications@github.com wrote:

I guess I should wait for @jar398 https://github.com/jar398 to give me the go ahead on this, since people have expressed that they want this, but none of the myriad docs explicitly discusses it.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/205#issuecomment-184968488 .

kcranston commented 8 years ago

What methods does this affect?

josephwb commented 8 years ago

"subtree" and "induced_subtree" (only the latter is implemented at the moment; I can turn it off if you like).

josephwb commented 8 years ago

Old:

curl -X POST http://localhost:7474/db/data/ext/tree_of_life_v3/graphdb/induced_subtree -H "content-type:application/json" -d '{"node_ids":["ott501678","ott666104","ott316878"], "ott_ids":[102710,536234,810751,81461,501678]}'
{
  "node_ids_not_in_tree" : [ "ott501678" ],
  "ott_ids_not_in_tree" : [ 501678 ],
  "newick" : "(((Clangula_ott316878,Perdix_ott102710)Galloanserae_ott5839486,((Selasphorus_calliope_ott536234)Trochilidae_ott810751,Setophaga_ott666104))Neognathae_ott241846)Aves_ott81461;",
  "synth_id" : "opentree4.1"
}

New:

curl -X POST http://localhost:7474/db/data/ext/tree_of_life_v3/graphdb/induced_subtree -H "content-type:application/json" -d '{"node_ids":["ott501678","ott666104","ott316878"], "ott_ids":[102710,536234,810751,81461,501678]}'
{
  "node_ids_not_in_tree" : [ "ott501678" ],
  "ott_ids_not_in_tree" : [ 501678 ],
  "newick" : "(((Clangula_ott316878,Perdix_ott102710)Galloanserae_ott5839486,((Selasphorus_calliope_ott536234)Trochilidae_ott810751,Setophaga_ott666104)mrcaott246ott5481)Neognathae_ott241846)Aves_ott81461;",
  "synth_id" : "opentree4.1"
}
jimallman commented 8 years ago

Thanks for the "before" and "after" examples. I was unclear on the consequences of this change. Do I understand correctly that this isn't just a matter of labeling, but of including entirely new nodes (e.g. mrcaott246ott5481 above)?

jar398 commented 8 years ago

we could make it an option if there are worries about clutter or space or compatibility. {"include_nontaxon_labels" : true , ...} not sure how to decide this.

On Wed, Feb 17, 2016 at 4:51 PM, Joseph W. Brown notifications@github.com wrote:

"subtree" and "induced_subtree" (only the latter is implemented at the moment; I can turn it off if you like).

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/205#issuecomment-185423241 .

josephwb commented 8 years ago

@jimallman the trees are the exact same except for labelling.

kcranston commented 8 years ago

+1 for including them Ambivalent about option to exclude. Hold off until it becomes a problem? Would be a compatible change.

josephwb commented 8 years ago

"nontaxon" nodes could be part of the query; a reason for them to be returned.

jimallman commented 8 years ago

the trees are the exact same except for labelling

thanks, my Newick-fu is lacking :bowtie:

josephwb commented 8 years ago

Here is subtree. Before:

curl -X POST http://localhost:7474/db/data/ext/tree_of_life_v3/graphdb/subtree -H "content-type:application/json" -d '{"node_id":"ott803675"}'
{
  "newick" : "(Gavia_stellata_ott1057044,((Gavia_arctica_ott1085739,Gavia_pacifica_ott651474),(Gavia_immer_ott1057518,Gavia_adamsii_ott90560)))Gavia_ott803675;",
  "synth_id" : "opentree4.1"
}

After:

curl -X POST http://localhost:7474/db/data/ext/tree_of_life_v3/graphdb/subtree -H "content-type:application/json" -d '{"node_id":"ott803675"}'
{
  "newick" : "(Gavia_stellata_ott1057044,((Gavia_arctica_ott1085739,Gavia_pacifica_ott651474)mrcaott651474ott1085739,(Gavia_immer_ott1057518,Gavia_adamsii_ott90560)mrcaott90560ott1057518)mrcaott90560ott651474)Gavia_ott803675;",
  "synth_id" : "opentree4.1"
}
josephwb commented 8 years ago

Deployed on test for testing. Let me know if you want me to revert things.

jar398 commented 8 years ago

It would be compatible in the sense of having a /v3 URL instead of a /v2 URL. But it's easy to imagine clients used to the v2 Newicks barfing on the v3 Newicks. That's OK with me, just cautioning on "would be a compatible change"

But I have no strong feelings.

On Wed, Feb 17, 2016 at 5:02 PM, Joseph W. Brown notifications@github.com wrote:

"nontaxon" nodes could be part of the query; a reason for them to be returned.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/treemachine/issues/205#issuecomment-185427642 .

josephwb commented 8 years ago

Please formulate some definite feelings, strong or no; otherwise how can this possibly move forward (or not)?

jimallman commented 8 years ago

Just FYI, I've imported both the old and new Newick strings from this comment above, on devtree:

https://devtree.opentreeoflife.org/curator/study/edit/tt_35/?tab=trees

jar398 commented 8 years ago

Look above for the strongest uncontradicted opinion. (It's "+1 for including them Ambivalent about option to exclude. Hold off [on making them optional] until it becomes a problem".)

A different, stronger, better supported opinion might come along later, e.g. from Mark or one of our users. So just write the code so it's easy to turn on or off. No big deal.

Getting this right immediately is not as important as getting things like 'tax_sources' right immediately because we don't have code inside the project that depends on it. That's why my attitude is different on this issue compared to #207.

josephwb commented 8 years ago

It is a big deal when I get lambasted for not doing exactly what is expected. The views expressed above are nowhere near the strong gatekeeper consensus you expressed during the hangout. I'll not be making any more decisions; tell me what you want, and I'll do it.

Be explicit:

  1. Do this or not?
  2. If yes, should there be a user argument to turn on/off?
  3. If yes, what is the exact name of the argument?
  4. If yes, what is the default value?
jar398 commented 8 years ago

Now that we're no longer pressing to get a stable API by Saturday, I suggest you not do anything until there has been proper review of a proposed spec.​

josephwb commented 8 years ago

Where does the spec come from? Me? Does 1-4 above, together withe examples, suffice? If not, what? How is the spec reviewed?

jar398 commented 8 years ago

I will prepare a spec​. It will become a github issue just like everything else. If you want to prepare a spec too, you may, and we can compare them when the time comes. But that is up to you.

Since you don't want to proceed without guidance, I have assigned this issue to me.

jar398 commented 8 years ago

Current behavior: curl -X POST https://devapi.opentreeoflife.org/v3/tree_of_life/subtree -H "content-type:application/json" -d '{"node_id":"ott803675"}' { "newick" : "(Gavia_stellata_ott1057044,((Gavia_arctica_ott1085739,Gavia_pacifica_ott651474)mrcaott651474ott1085739,(Gavia_immer_ott1057518,Gavia_adamsii_ott90560)mrcaott90560ott1057518)mrcaott90560ott651474)Gavia_ott803675;"}

The answer to #1 is yes. #2 is a good question. @kcranston what do you think, should we have a flag that controls whether to get v2-like behavior (i.e. suppress those long and mostly useless internal node labels)? Having seen some of these trees I think maybe we should.

kcranston commented 8 years ago

In response to @josephwb questions:

  1. Do this or not? yes
  2. If yes, should there be a user argument to turn on/off? yes, there should be an option
  3. If yes, what is the exact name of the argument? include_all_node_labels (if true, then ott and mrca labels, if false, then ott labels only)
  4. If yes, what is the default value? false (i.e. the default is the v2 behaviour)

(edited to change default from true to false and clarification on what the flag does)

jar398 commented 8 years ago

I think the default would be false, since in v2 we didn't have all the mrca labels.

I'm assuming the flag would only control the mrca labels, not the internal ott labels.

This way, with the flag false, the v3 behavior would be the same as the v2 behavior: yes ott internal labels, no mrca internal node labels.

josephwb commented 8 years ago

Is the goal to get this implemented for the current release? Seems low priority, as anyone getting trees is likely to throw out any internal node label. Plus, it feels awkward to have 2 label args:

  1. label_format
  2. include_all_node_labels

I'm inclined to just return all the labels and let the user dispose of them if they wish. Getting rid of the mrcaottXXXotYYY labels but not the ottWWW label isn't going to help anyone get anything done.

jar398 commented 8 years ago

There is code in the tree_of_life (v2) plugin in the v3 branch to throw away the mrca labels.

label_format only applies to OTT nodes. include_all_node_labels only applies to non-OTT nodes. So I see what you mean but technically they are orthogonal.

jar398 commented 8 years ago

include_all_node_labels is implemented (v3 branch on devapi) as @kcranston requested above.

I'm not too happy about the flag name. I'm groping for something short along the lines of 'include_id_when_unnamed' or 'use_id_for_label_when_node_is_unnamed' since the flag says whether the id should be used as the label when the node has no name (OTT id), instead of nothing at all. (For named nodes the label_format tells what label to use.) It's possible to put a spin on 'include_all_node_labels' that makes it a correct description, but it's hard: you have to say that the a priori label is according to the format, if named, or is the id, if unnamed; then you have to say "all labels" means use the a priori label in every case, while "not all labels" means use the a priori label only when the node is named.

There are 6 combinations of format + this flag, and they all make sense, but a couple of them are rather odd.

jar398 commented 8 years ago

@kcranston This issue just needs documentation and test cases and it will be done. Shall we (I) proceed?

jimallman commented 8 years ago

Do we want to expose this option in the webapps, perhaps as a checkbox? Or a drop-down, if there are more than two meaningful choices.

jar398 commented 8 years ago

I thought we would just leave it in the API. It's sort of an advanced feature

jimallman commented 8 years ago

Sounds good to me.