Open saramsey opened 2 years ago
Tagging @edeutsch
Great, thanks! Here is also an example in TRAPI: https://arax.ncats.io/?r=41651
One question we pondered: in the TRAPI attributes, is it better to represent multiple knowledge sources in a single attribute with a list as the value, or as multiple attributes?
Currently it is the latter (see link above). But maybe the former is better? Thoughts?
@edeutsch does the TRAPI spec give an indication which way we should go, in this case? (list type attribute value or multiple attributes?)
It does not. But it could. Any reason not to recommend that they be combined (unlike we're currently doing)?
the only reason I'm aware of is if the attribute_type_id
differs for different knowledge_source
s. since it's kind of difficult to decide whether each source is an aggregator vs. original source or whatever, I think we just decided to call all of them biolink:knowledge_source
for KG2 for now, and wait to see if it became important to get more fine-grained. so far I don't think we've heard any complaints?
Use of biolink:knowledge_source is discouraged: "In practice, implementers should use one of the more specific subtypes of this generic property."
There were no complaints because no one is really looking carefully I suspect.
Probably biolink:primary_knowledge_source is what we should be using.
Here are the docs: https://biolink.github.io/biolink-model/docs/knowledge_source.html
biolink:knowledge_source An Information Resource from which the knowledge expressed in an Association was retrieved, directly or indirectly. This can be any resource through which the knowledge passed on its way to its currently serialized form. In practice, implementers should use one of the more specific subtypes of this generic property.
biolink:aggregator_knowledge_source An intermediate aggregator resource from which knowledge expressed in an Association was retrieved downstream of the original source, on its path to its current serialized form.
biolink:primary_knowledge_source The most upstream source of the knowledge expressed in an Association that an implementer can identify (may or may not be the ‘original’ source).
biolink:original_knowledge_source The Information Resource that created the original record of the knowledge expressed in an Association (e.g. via curation of the knowledge from the literature, or generation of the knowledge de novo through computation, reasoning, inference over data).
ongoing discussion of this in the Architecture call..
The emerging consensus on the Architecture call is that RTX-KG2 should NOT be doing this semantic merging. That this scenario should be be represented as 3 different edges, each with a SINGLE biolink:primary_knowledge_source.
See architecture PR 73 to make your thoughts known (update there not yet made while I'm writing this, but is planned)
At today's AHM, it was asked if there could be an example provided of an edge in RTX-KG2 that is the result of merging more than one source triple. Using the Neo4j endpoint for KG2.7.6pre, kg2endpoint3.rtx.ai, the following Cypher query produces an example:
Here is the example triple:
This question came up in a discussion of agenda item "New ask from Architecture" which is summarized in this PR in the NCATSTranslator/TranslatorArchitecture project.