Closed mbaudis closed 5 years ago
Following the comment/request #157 from @mfiume, we should work on specifying the Handover structure with e.g. the DOS use case.
Working assumptions for the structure of the Handover protocol extension now are that:
handover: [
{
schema: "DOS",
access_key: "30822e80-8ef8-4ac9-af5d-304aa7f8c1dd"
}
],
The Handover implementation proposal addresses #157 and is related to #107.
As reminder, a simplified implementation has been prototyped for the Beacon+ resource and is conceptually documented here, though the format of the Handover object is assumed to be an object instead of the callset_access_handle
used in the demonstrator.
@mbaudis what if the object lives on a different server from the one that is generating the response?
Here, is the access key used to ID the object or does it comprise the authentication information required to fetch it, or both?
What about having a url
in the handover struct to point to the payload?
Can you provide an example of how the authentication procedure would be provided? I agree that this would be very helpful to encode as a hint, just wondering how you'd approach it.
@mfiume I don't think that this would be part of Beacon, but the general idea would be that the "handoff" key would point to whatever action is then executed. It doesn't really matter which server the data resides on; this is resolved from data_access_handle
and selected "action". The Beacon itself could expose a vocabulary of actions, so that a distributed query could e.g. be run over many nodes.
Sure, the handover object could be a url
; but the url
should not provide ids or such, just point to a resolver which can then extract which data object are pointed to. Basically the same as above, with
url: "https://beacondeliver.mygenomecollection.org/handover/30822e80-8ef8-4ac9-af5d-304aa7f8c1dd"
instead of
access_key: "30822e80-8ef8-4ac9-af5d-304aa7f8c1dd"
Authentication could be provided in OAuth etc., and the resolver would match credentials to access rights. This would allow the layered access of public beacon query + limited data retrieval.
In our current implementation, the callset_access_handle
points to a temporary DB, where the document has then the details:
_id
values of the callsets are stored in a database, where the _id
value of the document is returned as callset_access_handle
(well, our name here); the document looks like:
{
"_id" : "966fc3c2-5a11-11e8-bf6d-8f10af00a547",
"query_coll" : "callsets",
"query_key" : "id",
"query_values" : [
"PGX_AM_CS_GSM511473",
"PGX_AM_CS_GSM1102907",
"PGX_AM_CS_GSM437026",
"PGX_AM_CS_GSM878881"
],
"query_db" : "arraymap_ga4gh"
}
Now data can be retrieved by creating different style queries from this.
1. Getting the callset ids:
db.querybuffer.findOne({_id:'966fc3c2-5a11-11e8-bf6d-8f10af00a547'})
... would deliver the document shown. This has its own:
* database and collection to query
- `"query_db" : "arraymap_ga4gh"`
- `"query_coll" : "callsets"`
* attribute name
- `"query_key" : "id`
* attribute values
- `"query_values" : [ ... ]`
If you now follow the original GA4GH schema, you can retrieve e.g. all biosample ids by querying:
db.callsets.find({id:{$in:["PGX_AM_CS_GSM511473","PGX_AM_CS_GSM188255"]}},{biosample_id:1})
... etc., and the get the biosample data; similar for all variants from the matching callsets etc.
But this requires a standardised data structure in the `handover` delivery (here the GA4GH schema - which we use); or one starts to define other endpoints (and provides this with the Beacon response's handover info).
It is all rather trivial, if keeping to the basic principles of a schema which had been developed over years, without enforcing some of the more esoteric "recapitulate VCF column format" ideas of it.
Oh well...
We have now implemented this scenario, for "one click" actions, based on the variants/callsets/samples identified in the Beacon query.
Example (this is the excerpt from the Beacon response):
"datasetAlleleResponses": [
{
"callCount": 163,
"datasetId": "arraymap",
"error": null,
"exists": true,
"externalUrl": "https://beacon.progenetix.org/beacon/info/",
"frequency": 0.157,
"handover": [
{
"action": "create CNV histogram from matched callsets",
"label": "Histogram",
"url": "/beaconplus-server/beacondeliver.cgi?do=histogram&accessid=2a0136df-dc49-11e8-a927-8d34da1c5bc0"
},
{
"action": "export all biosample data of matched callsets",
"label": "Biosamples",
"url": "https://beacon.progenetix.org/beaconplus-server/beacondeliver.cgi?do=biosamples&accessid=2a0136df-dc49-11e8-a927-8d34da1c5bc0"
},
{
"action": "export all variants of matched callsets",
"label": "Callsets",
"url": "/beaconplus-server/beacondeliver.cgi?do=variants&accessid=2a0136df-dc49-11e8-a927-8d34da1c5bc0"
},
{
"action": "retrieve matching variants",
"label": "Variants",
"url": "/beaconplus-server/beacondeliver.cgi?do=variants&accessid=2a01d0bc-dc49-11e8-a927-a8c3673772cb"
}
],
"info": {
"callset_access_handle": "2a0136df-dc49-11e8-a927-8d34da1c5bc0",
"description": "The query was against database \"arraymap\", variant collection \"variants\". 163 matched callsets for 152 distinct variants. Out of 51820 biosamples in the database, 1038 matched the biosample query; of those, 163 had the variant.",
"payload": null
},
"sampleCount": 163,
"variantCount": 152
}
],
This
@sdelatorrep I would suggest adding also a label
attribute to the handover object.
Reasoning:
"type" : {
"id" : "ncit:C40078",
"label": "Ovarian clear cell adenocarcinoma"
}
Also, this would be an interesting scenarion in which we have to decide if we should implement the general OntologyClass
concept., which finds its way in other parts of GA4GH schemas.
So, the schema could then look like:
Handover:
type: object
required:
- type
- url
properties:
type:
type: object
required:
- id
properties:
id:
type: string
description: The use of an ontology term, in CURIE syntax, is strongly recommended. Use “CUSTOM” when no ontology is available.
default: CUSTOM
label:
type: string
description: A short label for the handover action. In the case of an ontology, this would be the "preferred Label".
url:
type: string
description: URL endpoint to where the handover process could progress (in RFC 3986 format).
note:
type: string
description: Additional human readable information or description about the handover.
(The type
here is a bit confusing, both as attribute name and as keyword... Alas, this is just for discussion.)
Hi @mbaudis , looks good! Though we think it's not necessary to create an object for the field type
. Check our proposal in PR #230, please.
PR #230 merged.
As was discussed previously and with some assigned future spot on the Beacon roadmap, there is general consensus about the need to implement a specification for a handoff protocol. The arguments supporting this development can be summarized as:
As starting point for discussions on the merits of this concept and how to implement it, we have prototyped a very basic version of a handoff concept (without use of a proper authentication procedure):
Beacon+ query => internal matched variants ==> internal retrieved callset ids ===> internal storage of callset ids in record in tmp database ====> external delivery of BeaconDatasetAlleleResponse +
info.callset_access_handle
Data access => callset_access_handle is submitted to authentication system ==> authentication procedure + fwd of callset_access_handle ===> data retrieval options based on authentication status
As part of the Beacon specifications, it should probably be sufficient to define an attribute name/format for the access handle; authentication etc. would be for demonstrators, "discovery" product ... but probably out of scope for the Beacon protocol itself (?).
See Beacon+ => CNV example => handover in response table => ...; the current concept is detailed in these slides.