Closed andrewsu closed 2 years ago
[updated 7/21 to reflect the discussion in the 7/20 lab call; rearrange to put what we're doing first at the top]
BTE has to add all the source-related information to the edge attributes array:
Current (the source-related attribute objects for an edge):
"attributes": [
{
"attribute_type_id": "api",
"value": [
"BioLink API"
],
"value_type_id": "bts:api"
},
{
"attribute_type_id": "provided_by",
"value": [
"Monarch Initiative"
],
"value_type_id": "biolink:provided_by"
},
{
"attribute_type_id": "source",
"value": [
"https://archive.monarchinitiative.org/#omim"
],
"value_type_id": "bts:source"
},
......
]
Desired (comments as //):
{ // add this
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": ["infores:translator-biothings-explorer"],
"value_type_id": "biolink:InformationResource"
},
{ // corresponds to the "api" object above
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": ["infores:biolink-api"],
"value_type_id": "biolink:InformationResource"
},
{ // corresponds to the "provided_by" object above
"attribute_type_id": "biolink:primary_knowledge_source",
"value": ["infores:monarchinitiative"],
"value_type_id": "biolink:InformationResource"
},
{ // no change to the "source" object above
"attribute_type_id": "source",
"value": [
"https://archive.monarchinitiative.org/#omim"
],
"value_type_id": "bts:source"
},
Here's some thoughts on how to update provenance. The situations below are based on what API BTE called to get that edge.
Important notes to read first:
currently BTE ingests these TRAPI APIs:
{
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": "infores:translator-biothings-explorer",
"value_type_id": "biolink:InformationResource"
}
This is the situation for APIs from multiomics and text mining provider, since they create knowledge from their analysis of data/publications...and perhaps some external APIs that we bring in.
The APIs BTE ingests right now that fit this are:
Other APIs that fit this (but BTE doesn't ingest right now):
BTE has to add all the source-related information to the edge attributes array:
Ideally from clinical risk kp api (the source-related attribute objects for an edge) - doesn't exist right now:
"attributes": [
{
"attribute_type_id": "api",
"value": [
"Clinical Risk KP API"
],
"value_type_id": "bts:api"
},
{
"attribute_type_id": "provided_by",
"value": [
"clinical-records-washington-2018"
],
"value_type_id": "biolink:provided_by"
},
{
"attribute_type_id": "provenance",
"value": "https://github.com/NCATSTranslator/Translator-All/wiki/EHR-Risk-KP",
"value_type_id": "bts:provenance"
}
......
]
Desired (comments as //): Notice that the url clinical risk kp api gave was moved to be under the primary knowledge source. Also I made up the supporting data source below since I don't know what it is; it's not in the info above.
{ // added
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": ["infores:translator-biothings-explorer"],
"value_type_id": "biolink:InformationResource"
},
{ // was "api" object above
"attribute_type_id": "biolink:primary_knowledge_source",
"value": ["infores:biothings-multiomics-clinical-risk"],
"value_url": "https://github.com/NCATSTranslator/Translator-All/wiki/EHR-Risk-KP"
"value_type_id": "biolink:InformationResource"
},
{ // was "provided_by" object above
"attribute_type_id": "biolink:supporting_data_source",
"value": ["infores:clinical-records-washington-2018"],
"value_type_id": "biolink:InformationResource"
},
As a very quick recap of today's discussion, @ariutta will take the lead on modifying the structure of the JSON output in the edge attributes, and @colleenXu will take the lead on updating the SmartAPI records for where most of those values are drawn. There undoubtedly will be other details and edge cases to fix later, but let's start with that...
I have edited my post above to reflect today's call. @andrewsu and @ariutta, please review at minimum the section under "Scenario B" and confirm whether these tasks/decisions correctly reflect today's decisions.
Quick note that the ARAX results viewer for Translator now has a nice visualization for the edge provenance info. For example, from https://arax.ncats.io/?source=ARS&id=a7af1e97-eae3-430d-b570-4da271ea56c7
@ariutta All APIs with yamls in registry update here are updated to address this issue. Note that 3 APIs don't have the "hard-coded" source field anymore; this is fine - they just won't have the corresponding attribute object in their attributes array.
once the 2 multiomics api yamls have their PRs merged / smartapi registry entries updated, they may also not have the "hard-coded" source field anymore
Note that Provenance situation A may be dealt with, once this PR are merged.
I notice that this PR seems to add the BTE provenance object mentioned above and included below:
{ // add this
"attribute_type_id": "biolink:aggregator_knowledge_source",
"value": ["infores:translator-biothings-explorer"],
"value_type_id": "biolink:InformationResource"
},
I'm okay with closing this issue for now, and opening it again to deal with Provenance situation C related issues as that comes up...
This is going to happen with text mining targeted association soon where the plan is to ingest the edge attributes field from records and preserve its structure...
The parent ticket is here: https://github.com/NCATSTranslator/TranslatorArchitecture/issues/48
This is an example edge with provenance:
These are the desired edge properties (copied from the parent ticket):