OpenTreeOfLife / phylesystem-api

API access to Open Tree of Life treestore
BSD 2-Clause "Simplified" License
9 stars 5 forks source link

Method to return tree from within a study should return metadata for tree #135

Open kcranston opened 9 years ago

kcranston commented 9 years ago

The /study/{STUDY_ID}/tree/{TREE_ID} method returns only the tree element of the NeXSON, without any of the study-level information (publication information, study id, mapping of otu elements to names / identifiers). There is no obvious way to connect information in the retrieved JSON with any external source that would give you more context. If we are advertising this as a public method, then should it not return something more meaningful?

Example: {"tree1": {"^ot:rootNodeId": "node1", "^ot:unrootedTree": false, "^ot:messages": {"message": [{"@code": "SUPPORTING_FILE_INFO", "@severity": "INFO", "@humanMessageType": "NONE", "data": {"files": {"file": [{"description": {"$": "Source data for tree 'tree1'"}, "@sourceForTree": "tree1", "@filename": "<content provided as a string in a \"content\" rather than a file upload>", "@url": "/curator/default/to_nexson?output=input&uploadid=ua30e75a4-a593-485f-b863-f5c63c015478", "@type": "newick", "@size": 78}]}, "@movedToPermanentArchive": false}, "@id": "messageua30e75a4-a593-485f-b863-f5c63c015478", "@wasGeneratedBy": "opentree.2nexml"}]}, "^ot:curatedType": "Maximum parsimony ", "nodeById": {"node9": {"@otu": "otu5"}, "node8": {"@otu": "otu4"}, "node1": {"@root": true}, "node3": {}, "node2": {"@otu": "otu1"}, "node5": {"@otu": "otu2"}, "node4": {}, "node7": {}, "node6": {"@otu": "otu3"}}, "^ot:inGroupClade": "node1", "@xsi:type": "nex:FloatTree", "@label": "21 morphological characters", "edgeBySourceId": {"node1": {"edge1": {"@source": "node1", "@target": "node2"}, "edge2": {"@source": "node1", "@target": "node3"}}, "node3": {"edge3": {"@source": "node3", "@target": "node4"}, "edge6": {"@source": "node3", "@target": "node7"}}, "node4": {"edge4": {"@source": "node4", "@target": "node5"}, "edge5": {"@source": "node4", "@target": "node6"}}, "node7": {"edge8": {"@source": "node7", "@target": "node9"}, "edge7": {"@source": "node7", "@target": "node8"}}}, "^ot:specifiedRoot": "node1", "^ot:outGroupEdge": "", "^ot:branchLengthTimeUnit": "", "^ot:branchLengthDescription": "", "^ot:tag": []}}

jimallman commented 9 years ago

I think part of the RESTful style is to look to the URL for context. In this case, the tree is a "sub-resource" of its parent study, so it's trivial to reckon the full study URL and retrieve its NeXSON. That is, if we're keeping track of the tree's URL. Perhaps a minimal solution would be an ad-hoc property or annotation with this study URL as its value:

{
    "study": "http://api.opentreeoflife.org/phylesystem/v1/study/pg_2929?output_nexml2json=1.2",
    "tree1": {
       ...

I agree that the lack of OTU information is more of a problem, since it's likely we'll want this for operations on a single tree.

jar398 commented 9 years ago

On Mon, Dec 1, 2014 at 7:37 PM, Jim Allman notifications@github.com wrote:

I think part of the RESTful style is to look to the URL for context.

I don't think so... see http://www.w3.org/TR/webarch/#uri-opacity and then note in section 8 that Tim Berners-Lee and Roy Fielding are authors.

In this case, the tree is a "sub-resource" of its parent study, so it's trivial to reckon the full study URL and retrieve its NeXSON. That is, if we're keeping track of the tree's URL. Perhaps a minimal solution would be an ad-hoc property or annotation with this study URL as its value:

{ "study": "http://api.opentreeoflife.org/phylesystem/v1/study/pg_2929?output_nexml2json=1.2", "tree1": { ...

I agree that the lack of OTU information is more of a problem, since it's likely we'll want this for operations on a single tree.

— Reply to this email directly or view it on GitHub https://github.com/OpenTreeOfLife/phylesystem-api/issues/135#issuecomment-65164961 .

jimallman commented 9 years ago

Thanks! I stand corrected.

What do you think of the ad-hoc study URL property? This seems to be a common solution for JSON resources, pointing to related resources (and likely objects of further inquiry) by URL. See for example the GitHub Issues API, which oddly doesn't include the URL of an issue's parent repo but does include them for related users, milestones, etc.

mtholder commented 9 years ago

I'm certainly open to changing the default response format.

Part of the motivation for the fine-grained access to parts of the NexSON was to allow a client build up the entire NexSON with a series of requests - rather than get the whole thing in one blob. So I'm not a fan of removing the "return a slice of the JSON" feature.

I expect that most users would want to use URLs that provide translation to a more useful format. e.g: http://devapi.opentreeoflife.org/phylesystem/v1/study/pg_719/tree/tree1294?format=nexus (aka http://devapi.opentreeoflife.org/phylesystem/v1/study/pg_719/tree/tree1294.nex )

or http://devapi.opentreeoflife.org/phylesystem/v1/study/pg_719/tree/tree1294?format=newick (aka http://devapi.opentreeoflife.org/phylesystem/v1/study/pg_719/tree/tree1294.tre )

FWIW I think the docs at: https://github.com/OpenTreeOfLife/phylesystem-api/blob/docv2/docs/README.md#fine-grained-access-via-get are pretty clear on the point that the study is the resource, and that we support methods for slicing it.

mtholder commented 9 years ago

just noting that this request is (somewhat loosely) related to the request by the API team at the hackathon to include an envelope of metadata in each response: https://github.com/OpenTreeOfLife/opentree/issues/437

There is also a connection to https://github.com/OpenTreeOfLife/phylesystem-api/issues/104 in which @jar398 notes that it is confusing that the nexson content is placed in a "data" property when we request the full file. but for the fine-grained access, the response is the slice of data requested (with no context).

I like @jimallman 's suggestion in this current thread of putting the study link in the response.

So do we want the rule to be:

  1. when you request JSON you get: A. the information that you requested in "data" property, B. other helpful info (study URL, sha...) as other properties of the response (that are at the top level, sister to the "data" property).
  2. when you request other formats you just get the info requested

This goes against @jar398 's preference in https://github.com/OpenTreeOfLife/phylesystem-api/issues/104 . But I think that if I asked for the tree in NEXUS or newick, I'd be a bit annoyed to have to unpack it from JSON.

Adding a boolean "include_context" parameter in the request, could let us deal with client's of either preference.

note edited for better markdown

kcranston commented 9 years ago

I agree with @mtholder about not putting the newick / nexus response inside JSON. With these formats, though, the tree is meaningful on its own because the labels are more than "otu7".

My specific use case is trying to put together the list of trees that make up the synthetic tree for inclusion in a Dryad data package. I want NeXSON because I want metadata, but not having the metadata about publication, otu mapping, etc makes the trees themselves useless.

I suppose what I want is "give me study X with only tree Y".

mtholder commented 9 years ago

It shouldn't be hard to implement a service to return a NexSON like @kcranston wants. the server can just remove the trees that don't match the IDs (assuming that we don't mind if there are extra OTUs in the response or annotations to entities that only exist in the full study).

Do we want to change the behavior of the existing URL ? ( -1 vote from me on that)

Or add a new parameter like cull_nonmatching to indicate that we want a full NexSON with some parts culled instead of a fragment of NexSON.

I'll try to work on this tomorrow. er later today. After sleep...