NCATSTranslator / Evidence-Provenance-Confidence-Working-Group

MIT License
1 stars 1 forks source link

'At-A-Glance' (AAG) Edge Annotations for high-level EPC information #10

Open mbrush opened 1 year ago

mbrush commented 1 year ago

The 'At-A-Glance' (AAG) idea refers to a set of 4-5 edge properties that provide a high level EPC summary, allowing users to make a first pass assessment of confidence and relevance for a given KG Edge (or a 'Result' that maps to a single asserted or predicted KG Edge).

There is a long history of proposals for this type of thing, coming from different perspectives and stakeholders (summarized here). These proposals have been aligned and refined over the past year. IMO we are at a point where we need to move toward implementing it.

This issue proposes an initial set of AAG properties to implement, and can serve as a place to discuss how to move this from idea to practice. Separate tickets will be created for proposals/discussion around developing and implementing each proposed property.

Initial discussions focused on the following five types of information that AAG properties could provide:

  1. Knowledge Level/Type: the level/type of knowledge that is reported in an edge, based on how the knowledge was produced, the strength of evidence supporting it, or our confidence in its validity. (see #11) a. e.g. ‘Knowledge Assertion’, ‘Logical Entailment, ‘Prediction’, ‘Statistical Association’, etc.

  2. Agent Type: the type of agent that generated the statement expressed in an edge (see #12) a. e.g. 'Manual Agent', 'Automated Agent', 'Computational Model', 'Text-Mining Agent', etc.

  3. Supporting Evidence Type(s): the types of information / data was used as evidence in generating the statement expressed in an Edge a. e.g. ‘experimental data’, ‘clinical data’, ’sequence similarity data’, ‘mutant phenotype data’, etc.

  4. Supporting Methodologies: reasoning, analytical, or experimental methodologies that were applied in generating the stated knowledge, and/or the evidence supporting it. t.b.d. if we want to report these at the type level, instance level, or linkouts to free-text descriptions. a. examples of type level method info: . 'rule-based graph inference', 'unsupervised machine learning', 'chi-squared analysis', 'hidden markov model', 'electron microscopy', 'yeast-two-hybrid assay' etc. b. examples of instance level method info: 2015 ACMG Variant Interpretation Guidelines, ClinGen SOP for Gene Validity Curation, ARAGORN Rule-Mining Prediction algorithm, ICEES correlation analysis pipeline c. examples of descriptions: see content of Translator Resource Wiki Pages, e.g. for Improving Agent

  5. Edge Confidence Score(s): qualitative terms and/or quantitative values reflecting how confident an agent is in the veracity of the specific statement expressed an Edge a. qualitative scores may include things like 'definitive,' 'possible', 'unlikely', or 'high confidence', 'medium confidence', 'low confidence' b. quantitative scores will likely be scaled between 0 and 1 (e.g. '0.998', '0.032') c. t.b.d. if/how we will normalize confidence scores, and if scores for Statements of different Knowledge Types will be directly comparable or evaluated on separate scales or only in comparison to other statements in the same category.


Notes:

mbrush commented 7 months ago

Examples of how AAG properties can tell a high level story about a given Edge: https://docs.google.com/document/d/1ESnpiPx_J2EmpsR8K6Q8IVROGfahbFOPF1XLTaM1YNQ/edit#heading=h.90lu494u3a11