Open RichardBruskiewich opened 2 years ago
frequency term
' alongside 'frequency value' and designate it typeof: uriorcurie
aggregate statistic
and its child models to 'association slot
' or just generic 'slot
' model instances? In their slot names, is the 'has
' prefix superfluous, or might we rename the has
to frequency
, then rename the frequency quantifier
to frequency quantifier mixin
for consistency alongside the frequency qualifier
?frequency qualifier mixin
in entity to feature or disease qualifiers mixin
out of the is_a
hierarchy and into a mixins
list, then add the frequency quantifier mixin
alongside it (@sierra-moxon seems to think that this is permitted) entity to feature or disease qualifiers mixin:
description: >-
Qualifiers for entity to disease or phenotype associations.
mixin: true
mixins:
- frequency qualifier mixin
- frequency quantifier mixin
slots:
- severity qualifier
- onset qualifier
This will automatically integrate frequency quantifier mixin
into all of the same association definitions as frequency qualifier mixin
I'm not sure if this is a complete set of ideas, but it's a start. Open for comments!
There are a number of existing slots/mixins/classes relating to - broadly speaking - phenotypic frequency. Recent discussions suggest room for improvement in their definition, structure and relationships.
This issue is originally inspired by some work in Monarch (thus relating to the SRI Reference KG) but may have broader impact on other Biolink Model users (e.g. Translator). A number of ideas are now on the table - discussed below.
@cmungall @kevinschaper @sierra-moxon
Core Issue
The concept of "phenotypic frequency" (broad sense, perhaps including genetic and non-genetic, disease-related and non-disease related expressed features of biological systems) is currently represented in a somewhat heterogeneous, one might hazard to say, currently somewhat inconsistent, incomplete or inefficient manner within the Biolink Model.
This is likely a reflection of the diversity of semantics of the concept within biology, and more specifically, within projects currently using the Biolink Model (i.e. Monarch, Translator, etc.)
Data with phenotypic frequency comes into knowledge graphs from various sources (e.g. HPOA, model organism data, etc.), may be quantitative (e.g. actual percentages or ratios), or qualified/categorical (e.g. HPO term annotated).
The frequency itself may be stated with reference to an entire population or a subsample (a general cohort with controls, or a specific study of patients). The frequency may simply be an observation of incidence of the (phenotype, disease or other feature) or may be made with reference to some knowledge of the underlying genotype (e.g. genetic penetrance and expressivity).
The purpose of this issue is to review the overall representation of the concept within Biolink Model with a view towards more concise, complete and efficient representation. A complementary concern is how best to present frequency annotation computationally - e.g. in various project-relevant knowledge graph representations including KGX and Python code (i.e. pydantic)?
Relevant Current Biolink Models
Types
Slots
(Note: we ignore several
relative frequency
association slots here as not specific to phenotypes)Mixins (abridged definitions)
where the above slots come from:
The
frequency quantifier
is directly a mixin only used in thevariant to population association
below.In contrast, the
frequency qualifier
slot is captured in another mixin:Associations (abridged)
Associations linked to the
entity to phenotypic feature association mixin
:Associations linked to the
entity to disease association mixin
:General Questions, Observations and Concerns
frequency value
is a string, without any other constraints. How does one differentiate between a number frequency and an encoded (e.g. ontology term specified) frequency?quotient
type is a float. Should it rather be 2-tuple of numerator and denominator?frequency qualifier
: is anassociation slot
embedded in afrequency qualifier mixin
frequency quantifier
: is itself a mixin, a child of therelationship quantifier
mixin that aggregates severalnode property
slots that seem to be related to the frequency type declarations noted above (but are not referencing them?).frequency qualifier mixin
is directly cited in thevariant to population association
but also subclassed intoentity to feature or disease qualifiers mixin
, itself subclassed intoentity to phenotypic feature association mixin
andentity to disease association mixin
to insert its semantics into a variety ofbiolink:Association
classes.frequency quantifier
mixin is only cited in thevariant to population association
. Should it also somehow be cited by theentity to feature or disease qualifiers mixin
to enable its use in association classes? However, the mixin is currently specified (see above) in such a fashion to constrain its use with nodes, not edges... despite its only citation relating to an association?