OpenTreeOfLife / otcetera

C++20 lib for manipulations of phylogenetic trees and supertree operations
Other
4 stars 3 forks source link

synth node mrca labels not returned #111

Open snacktavish opened 2 years ago

snacktavish commented 2 years ago

https://github.com/OpenTreeOfLife/germinator/wiki/Synthetic-tree-API-v3#subtree-tree-of-life

The expected return of this call curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/subtree -H "content-type:application/json" -d '{"node_id":"ott803675"}'

shows the synth mrca node labels - but those don't seem to be returned anymore.

 "newick": "(((Gavia_adamsii_ott90560,Gavia_immer_ott1057518),(Gavia_pacifica_ott651474,Gavia_arctica_ott1085739)),Gavia_stellata_ott1057044,Gavia_egeriana_ott3595544,Gavia_fortis_ott3595545,Gavia_concinna_ott3595546,Gavia_howardae_ott3595548,Gavia_brodkorbi_ott6151844,Gavia_pusilla_ott6151845,Gavia_roseiventris_ott7659869)Gavia_ott803675;",
 "supporting_studies": [
  "ot_521@tree1",
  "ot_809@tree2",
  "ot_104@tree1"
 ]

I was planning to use them to link dates from studies to synth subtrees.

mtholder commented 2 years ago

I seem to remember this bug showing up once (at least) before. @bredelings do you recall that?

mtholder commented 2 years ago

I just logged on to the server and checked the raw otcetera call and the call through ws_wrapper. The bug is present in both, so I think this is an otcetera issue not a ws_wrapper issue.

mtholder commented 2 years ago

or perhaps those labels are from propinquity. That is another possibility...

bredelings commented 2 years ago

Or possibly those labels are only added to the tree if you ask for arguson?

mtholder commented 2 years ago

They are in the newick example of the docs https://github.com/OpenTreeOfLife/germinator/wiki/Synthetic-tree-API-v3#subtree_tree

bredelings commented 2 years ago

Looking at the source code, this functionality is disabled by default but you can get the internal node labels by adding "include_all_node_labels" : true:

curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/subtree -H "content-type:application/json" -d '{"node_id":"ott803675","include_all_node_labels":true}'

See https://github.com/OpenTreeOfLife/otcetera/blob/2b5ff724094f768df9bc37b9f0ffb319abd03a20/ws/tolwsbooting.cpp#L354

Also, we have this fun comment:

    // FIXME: According to treemachine/ws-tests/tests.subtree, there is an "include_all_node_labels"
    //        argument.  Unless this is explicitly set to true, we are supposed to not write node labels
    //        for non-ottids.  At least in Newick.
bredelings commented 2 years ago

Perhaps the least intrusive change would be to document the "include_all_node_labels" argument and update the sample output?

bredelings commented 2 years ago

Here's a previous discussion of this: https://github.com/OpenTreeOfLife/treemachine/issues/205 It looks like the design consensus was that we should have the current behavior (default to no labels, but add them given a flag) to preserve compatibility with v2.

bredelings commented 2 years ago

And here is the code in ws-tests that checks that we do NOT include internal node labels by default.

https://github.com/OpenTreeOfLife/treemachine/blob/master/ws-tests/test_subtree.py

snacktavish commented 2 years ago

Ah super helpful! Thanks. Should we change the default to True, or update the docs to show argument and default is False and correct the example output? I added it as an option in the branch I'm working on in python-opentree, but haven't PR'd. https://github.com/OpenTreeOfLife/python-opentree/commit/b2a69353ff735b3f34a96a980cec35cadd41baca

mtholder commented 2 years ago

hmm. I'd say that we should definitely pass the "include_all_node_labels": true from the client when we want the labels (in case we change our mind in a future version of the API). My instinct is that we should make the implementation match the docs, but the fact that we have a previous issue indicating that we purposefully chose to make ""include_all_node_labels": false the default, but then forgot to document that is troubling. I'm not sure whether we should fix the docs or the impl. I'd vote updating the docs, but will defer if others have a strong preference.

snacktavish commented 2 years ago

I have a guess that it was changed to not include internal node labels because when label_format = 'name' a bunch of internal node labels have parens in them. Which, as discussed in various other issues I can't find right now, is legal newick, but that many treeviewers cannot cope with. (e.g. https://tree.opentreeoflife.org/taxonomy/browse?id=996421 for a label with several characters that can cause issues) .There are a few tips with that issue as well, but way fewer.

snacktavish commented 2 years ago

Which is to say, I think setting the default to False and updating the docs makes sense. I have been contemplating suggesting that we don't include parens in taxon labels, but that seems like a whole PITA. Any they should be allowed anyways! It is the tree viewers who are wrong!

snacktavish commented 2 years ago

I think my theory is wrong anyhow - because even when include_all_node_labels = False, named taxa are still included in internal node labels, just the mrca labels are not e.g. curl -X POST https://api.opentreeoflife.org/v3/tree_of_life/subtree -H "content-type:application/json" -d '{"node_id":"ott524059", "include_all_node_labels":false, "label_format":"name"}'

So the label string problems are not solved anyhow.

Edited to add: Example from user issue https://gitter.im/OpenTreeOfLife/public?at=61ca3aafe1a1264f0a2adde9

bredelings commented 2 years ago

We could also switch the default for v4. It seems like switching the default for v3 basically depends on whether we think any existing software that isn't expecting internal node names would break if we added them.

I have no reason to think there are newick parsers that can't handle internal node names... and yet the badness of newick parsers seems to be unbounded below. It kind of seems like a judgement call.

snacktavish commented 2 years ago

I put it on the agenda to discuss at our next dev meeting!

snacktavish commented 2 years ago

I will update documentation to reflect current behavior.

bredelings commented 2 years ago

OK, new branch with 1-line fix: default-include-node-labels