NCATSTranslator / Evidence-Provenance-Confidence-Working-Group

MIT License
1 stars 1 forks source link

Define the 'Knowledge Level/Type' AAG Property #11

Open mbrush opened 1 year ago

mbrush commented 1 year ago

Building on the proposal outlined in #10, this issue outlines a set of categories for describing the level of knowledge that is reported in an edge, based on how the knowledge was produced, the strength of evidence supporting it, or our confidence in its validity.

Below are brief definitions of 6 categories that would be enumerated as values of such a property. Additional detail and insight re: the specific use and relevance of each in Translator can be found in the document here and slide deck here.

  1. Knowledge Assertion: A statement of purported fact that is put forth by an agent as true, based on assessment of direct evidence. Assertions are likely but not definitively true.
  2. Logical Entailment: A statement reporting a conclusion that follows logically from premises that are established facts or knowledge assertions. i.e. a. Deductive Inference.
  3. Prediction: A statement of a possible fact based on probabilistic forms of reasoning over more indirect forms of evidence that lead to more speculative conclusions.
  4. Statistical Association: A statement that that reports concepts representing variables in a dataset to be statistically associated in a particular cohort (e.g. “Metformin Treatment (variable 1) is correlated with Diabetes Diagnosis (variable 2) in EHR dataset X”).
  5. Observation: A statement reporting (and possibly quantifying) a phenomenon that was observed to occur - absent any analysis or review that generates a statistical association or supports a stronger/broader conclusion or inference.
  6. Unspecified: the knowledge level/type cannot be determined from available information.

Finally, two additional categories were initially proposed, but have been left out of the initial draft:

  1. Established Fact: Statements asserting truths that are well-established based on an abundance of evidence, broad community consensus, and/or standing the test of time.
  2. Hypothesis: A statement expressing a possible fact for which there is insufficient evidence to make a Prediction or Assertion, but which may warrant further scientific interrogation.

We felt that in practice these may be hard to distinguish from 'Knowledge Assertion' and 'Hypothesis', respectively, and will leave them out until such time as they are deemed useful or necessary, and we can define clear rules on how to distinguish and apply them.


Note that these categories are complemented by the 'Agent Type' categories in #12. We had previously considered precomposing terms from the cross-product of (relevant) Knowledge Level and Agent Type categories (e.g. 'Manual Knowledge Assertion', 'Computational Model Prediction') - but for now are splitting these into separate properties/annotations.

Examples of their application to real Translator scenarios/use cases can be found in the document here.

mbrush commented 1 year ago

The modeling team is making recommendations around how data is ingested and represented by KPs that rely on the complementary information provided by AAG tags such as this - to ensure users can quickly understand the meaning and utility of a given Edge or Result. For example, in deciding how to extract and represent knowledge form model organism databases based on drug treatments studies in animal models of disease, we may propose that drugs shown to be efficacious in alleviating disease phenotypes in these models be subjects of treats statements whose SPOQ form will look identical to statements describing clinical trial based assertions that a chemical treats a disease.

What will distinguish these statements is the AAG / EPC information indicating the former to be a model-organism based prediction and the latter to be a human-study based assertion. This means that this AAG metadata needs to be prominently displayed and considered by users any time they assess an edge/answer provided by Translator - to ensure they understand the nature/utility of a given Edge or Result.

Given this, we would propose an approach where a Knowledge Level tag is REQUIRED on all Edges provided by KPs / ARAs. Note that there is an 'Unspecified' category that could be used in cases where the provider is not sure about the level.