The 'At-A-Glance' (AAG) idea refers to a set of 4-5 edge properties that provide a high level EPC summary, allowing users to make a first pass assessment of confidence and relevance for a given KG Edge (or a 'Result' that maps to a single asserted or predicted KG Edge).
There is a long history of proposals for this type of thing, coming from different perspectives and stakeholders (summarized here). These proposals have been aligned and refined over the past year. IMO we are at a point where we need to move toward implementing it.
This issue proposes an initial set of AAG properties to implement, and can serve as a place to discuss how to move this from idea to practice. Separate tickets will be created for proposals/discussion around developing and implementing each proposed property.
Initial discussions focused on the following five types of information that AAG properties could provide:
Knowledge Level/Type: the level/type of knowledge that is reported in an edge, based on how the knowledge was produced, the strength of evidence supporting it, or our confidence in its validity. (see #11)
a. e.g. ‘Knowledge Assertion’, ‘Logical Entailment, ‘Prediction’, ‘Statistical Association’, etc.
Agent Type: the type of agent that generated the statement expressed in an edge (see #12)
a. e.g. 'Manual Agent', 'Automated Agent', 'Computational Model', 'Text-Mining Agent', etc.
Supporting Evidence Type(s): the types of information / data was used as evidence in generating the statement expressed in an Edge
a. e.g. ‘experimental data’, ‘clinical data’, ’sequence similarity data’, ‘mutant phenotype data’, etc.
Supporting Methodologies: reasoning, analytical, or experimental methodologies that were applied in generating the stated knowledge, and/or the evidence supporting it. t.b.d. if we want to report these at the type level, instance level, or linkouts to free-text descriptions.
a. examples of type level method info: . 'rule-based graph inference', 'unsupervised machine learning', 'chi-squared analysis', 'hidden markov model', 'electron microscopy', 'yeast-two-hybrid assay' etc.
b. examples of instance level method info: 2015 ACMG Variant Interpretation Guidelines, ClinGen SOP for Gene Validity Curation, ARAGORN Rule-Mining Prediction algorithm, ICEES correlation analysis pipeline
c. examples of descriptions: see content of Translator Resource Wiki Pages, e.g. for Improving Agent
Edge Confidence Score(s): qualitative terms and/or quantitative values reflecting how confident an agent is in the veracity of the specific statement expressed an Edge
a. qualitative scores may include things like 'definitive,' 'possible', 'unlikely', or
'high confidence', 'medium confidence', 'low confidence'
b. quantitative scores will likely be scaled between 0 and 1 (e.g. '0.998', '0.032')
c. t.b.d. if/how we will normalize confidence scores, and if scores for Statements of different Knowledge Types will be directly comparable or evaluated on separate scales or only in comparison to other statements in the same category.
Notes:
These AAG properties would be implemented as Association Slots (aka Edge Properties) in the Biolink Model.
Where relevant, enumerations would be created in Biolink as well, to constrain permissible values for consistent data entry, and provide a central location to clearly define each value.
More detailed representation of EPC metadata will also be supported by the Biolink model - to complement / extend the superficial view provided by the AAG fields.
In TRAPI, this information could be captured using Edge Attributes keyed on these Biolink edge properties, alongside other Edge metadata. However, we might consider defining dedicated named properties that hang directly from an Edge object in the TRAPI schema - which would reflect their importance, and promote their visibility / parsability.
In the UI, these AAG properties would be prominently displayed for each Edge or Result it returns to users- providing a high level understanding of supporting EPC, and allowing filtering / navigation down to Results of most interest/relevance. More detailed EPC would also be reported where provided by sources, and accessible to users upon deeper exploration of selected answers.
The 'At-A-Glance' (AAG) idea refers to a set of 4-5 edge properties that provide a high level EPC summary, allowing users to make a first pass assessment of confidence and relevance for a given KG Edge (or a 'Result' that maps to a single asserted or predicted KG Edge).
There is a long history of proposals for this type of thing, coming from different perspectives and stakeholders (summarized here). These proposals have been aligned and refined over the past year. IMO we are at a point where we need to move toward implementing it.
This issue proposes an initial set of AAG properties to implement, and can serve as a place to discuss how to move this from idea to practice. Separate tickets will be created for proposals/discussion around developing and implementing each proposed property.
Initial discussions focused on the following five types of information that AAG properties could provide:
Knowledge Level/Type: the level/type of knowledge that is reported in an edge, based on how the knowledge was produced, the strength of evidence supporting it, or our confidence in its validity. (see #11) a. e.g. ‘Knowledge Assertion’, ‘Logical Entailment, ‘Prediction’, ‘Statistical Association’, etc.
Agent Type: the type of agent that generated the statement expressed in an edge (see #12) a. e.g. 'Manual Agent', 'Automated Agent', 'Computational Model', 'Text-Mining Agent', etc.
Supporting Evidence Type(s): the types of information / data was used as evidence in generating the statement expressed in an Edge a. e.g. ‘experimental data’, ‘clinical data’, ’sequence similarity data’, ‘mutant phenotype data’, etc.
Supporting Methodologies: reasoning, analytical, or experimental methodologies that were applied in generating the stated knowledge, and/or the evidence supporting it. t.b.d. if we want to report these at the type level, instance level, or linkouts to free-text descriptions. a. examples of type level method info: . 'rule-based graph inference', 'unsupervised machine learning', 'chi-squared analysis', 'hidden markov model', 'electron microscopy', 'yeast-two-hybrid assay' etc. b. examples of instance level method info: 2015 ACMG Variant Interpretation Guidelines, ClinGen SOP for Gene Validity Curation, ARAGORN Rule-Mining Prediction algorithm, ICEES correlation analysis pipeline c. examples of descriptions: see content of Translator Resource Wiki Pages, e.g. for Improving Agent
Edge Confidence Score(s): qualitative terms and/or quantitative values reflecting how confident an agent is in the veracity of the specific statement expressed an Edge a. qualitative scores may include things like 'definitive,' 'possible', 'unlikely', or 'high confidence', 'medium confidence', 'low confidence' b. quantitative scores will likely be scaled between 0 and 1 (e.g. '0.998', '0.032') c. t.b.d. if/how we will normalize confidence scores, and if scores for Statements of different Knowledge Types will be directly comparable or evaluated on separate scales or only in comparison to other statements in the same category.
Notes: