Closed kcranston closed 8 years ago
I don't like this idea. The mrcaottXXXottYYY
are terminals. Do we want something like "MRCA of Ambystoma tigrinum and Chauna torquata" for a node in the middle of Metazoa?
I think we want the current behaviour: the closest taxa that are in the tree.
I agree with @josephwb that generating a name from the node_id
will result in cryptic names. I believe that the rule used is something like "grab the lowest OTT ID assigned to a terminal from the first child and another lowest terminal OTT ID from the second child to create a node ID." So there is no reason to think that they would be particularly recognizable taxa.
It seems like adding descendant_name_list
to the return value of v3 arguson would be reasonable. Or we could just have the spec note that the listing of fields in an arguson node is not necessarily a complete listing - i.e. we could include this as bit of data that is not publicly documented.
I'm with @josephwb on this one. I understand why using more rootward names could be considered ugly or unprincipled, but the species names aren't going to help anyone and we so far have not come up with any labeling scheme of greater utility to the user.
The omission of that field from the spec was an oversight. The examples and description I was working from didn't contain the field.
A general disclaimer about the result possibly containing fields not defined by the spec is a good one to make, since in all API methods we want to have the option of compatibly adding fields without advancing the API version. We just have to be aware that people always reverse engineer these things, so someone might try to hold us to implemented features even if they're not documented. Therefore, once added, a result probably shouldn't be changed or removed without advancing to a new API version. And if this is the case, we might as well document it.
Is there a definite plan here? Seems like we need is:
name
field from the taxon_blob
within the node_blob
to get node labelFor 2, this would have to go within the node_blob
. So, some questions:
descendant_name_list
)?What about non-terminal arguson nodes (e.g. within lineage
)? Do these need labels other than mrcaottXXXottYYY
@jimallman?
My opinions 1. I don't know. descendant_name_list is not great, but it will do in a pinch. (It suggests value is names of all descendants, which it's not.) Maybe inverse_mrca or inverse_mrca_names ?
Why a a maximum length of 2? The old version did not have a limit; Jim just picked 2 (first and last?) to display.
Because clients only need two, and in some cases there are many thousands of such names. Seems a waste to transmit vast amount of information that's not going to be used.
Well, correction, I don't that I can prove there are such cases. There are taxonomy nodes with thousands of children, but I don't know there are non-taxon synthesis nodes with them. But I still don't see any reason to return more than are needed.
So, first and last, just first 2, something else?
FWIW, I'd prefer to return a label
field on node-blob (vs. name
to avoid confusion with taxon names), with a pre-constructed label (format TBD).
Because clients only need two, and in some cases there are many thousands of such names.
Note that the client does currently show a different node label with a final ellipsis if there are more than two descedant tips, e.g., [Apterygiformes + Rheiformes + ...]
.
What about non-terminal arguson nodes (e.g. within lineage)? Do these need labels other than mrcaottXXXottYYY @jimallman?
Yes, these nodes need some kind of label information, since they can be examined in the synth-tree viewer.
Ah, what do I know.
Maybe return two or three names, or maybe return a label. I just thought the name list gave more flexibility in case we wanted to change the notation on the client side. We really don't want people parsing these things. But I don't feel strongly.
I believe the order of the children list is random, so I don't think it matters how it is sampled. If you want to go way beyond the call of duty, I'd suggest names of large taxa, since those are the ones mostly likely to be recognized by the user.
To be clear, what I'm asking for is a label
field (or equivalent) on any node-blob that doesn't include a taxon. This way we'll always have a sensible way to represent otherwise "unnamed" nodes.
@jar398 points out that the label format for unnamed nodes should left to the arguson client to decide, so the earlier descendant_name_list
(with up to three names) makes sense to me. We'd still choose the names to include, so it might not suit every labeling preference, but it seems like a good place to start.
@kcranston would prefer that we add this only in arguson views, versus in any node-blob without a mapped taxon, since it's (currently) only useful in argus.
Do we want names returned here, or ottIDs? The latter would require @jimallman to make a taxonomy query.
Good point! Please return names if possible.
The evidence is that this is now fixed. Re-open if you think not. https://devtree.opentreeoflife.org/opentree/opentree4.1@mrcaott786ott112387/Dermoptera--Primates
oops, maybe too soon, double checking tests and spec...
test fixed. I put a comment in the spec saying what needs to be done to it.
The current version of arguson contains a field called
descendant_name_list
that contains a list of named descendant nodes that the client uses to build display names for nodes that are not taxa. For example, this node has the display label "[Asteraceae + Goodeniaceae + ...]".In the v3 spec, arguson does not return any such information for non-taxon nodes. The only information in a non-taxon node blob is the id, which will have the form mrcaott####ott####. Can we build a display name from that node_id? Get the taxon names from the two OTT ids in the node_id string and display "MRCA of Name 1 and Name 2"? (I actually like this better than the current "name + name + ..." string). Other suggestions?
See gitter conversation that started this.