Open bill-baumgartner opened 3 years ago
For comparison purposes, shown below is an alternative approach that uses no nesting of Attributes, and instead makes use of arrays to specify attribute values. For a given EPC packet, the sentence, score, subject & object spans, and PMID are inherently connected based on the array index used to store their values.
Note: This is the current output format used by the Service Provider to serve up the Text Mining Provider text-mined Biolink association KG.
edges:
- id: 9445e98f72ada21aa572559e303e4d5ac414650f
predicate: biolink:negatively_regulates,
subject: CHEBI:3215 # bupivacaine
object: PR:000031567 # LRRC3B
attributes:
- type: biolink:provided_by
name: provided_by
value: Text Mining KP
- type: bts:api
name: api
value: Text Mining Targeted Association API
- type: bts:score
name: score
value:
- 0.99956816
- 0.876
- type: bts:sentence
name: sentence
value:
- "The administration of 50 µg/ml bupivacaine promoted maximum breast cancer cell invasion, and suppressed LRRC3B mRNA expression in cells."
- "This is a second sentence indicating that bupivacaine negatively regulates LRRC3B."
- type: bts:subject_spans
name: subject_spans
value:
- "31|42"
- "42|53"
- type: bts:object_spans
name: object_spans
value:
- "104|110"
- "75|81"
- type: bts:publications
name: publications
value:
- PMID:29085514
- PMID:12345678
For each text-mined Biolink association, we would like to provide relevant EPC data including:
subject
andobject
of the assertionThis goal of this issue is to discuss how to represent the EPC data using the
Attribute
object that is defined in the TRAPI specification.An initial proposal for Attribute representation is available in this document.
The proposal in this issue builds off of the original, and specifically addresses a need to group EPC into individual packets that contain the sentence and other relevant information so that multiple EPC packets can be associated with a single assertion.
Data for a text-mined assertion
Proposed Attribute representation
The proposed
Attribute
representation models this assertion as a single edge betweenbupivacaine
andLRRC3B
with two accompanyingAttributes
representing the EPC data. NestedAttributes
are used to allow each packet of sentence information to be self-contained. Also demonstrated are attributes representing a confidence score for the concept recognition of each node (concept), and an aggregate confidence score computed for each edge.