biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
170 stars 71 forks source link

Confounding semantics of the 'attribute' tag - entity #502

Open RichardBruskiewich opened 3 years ago

RichardBruskiewich commented 3 years ago

Note: See PR https://github.com/biolink/biolink-model/pull/539 for a significant resolution of this issue...

The current definition of the 'attribute' class (and its parent) in the Biolink Model is the following:

  abstract entity:
    description: >-
      Any thing that is not a process or a physical mass-bearing entity

  attribute:
    subclass_of: PATO:0000001
    mappings:
      - SIO:000614
    is_a: abstract entity
    mixins:
      - ontology class
    description: >-
      A property or characteristic of an entity.
      For example, an apple may have properties such as color, shape, age, crispiness.
      An environmental sample may have attributes such as depth, lat, long, material.
    slots:
      - has attribute type
      - has quantitative value
      - has qualitative value
    in_subset:
      - samples

There are a number of concerns with this definition. This issue reviews them to guide possible Biolink Model revisions of the handling of the attribute class.

The semantic anchoring of the class with subclass_of PATO:0000001.

This is discussed in Biolink issue https://github.com/biolink/biolink-model/issues/501 so we won't elaborate the concerns with this any further here.

Inheritance of 'attribute' semantics from ontology class

First, 'ontology class' is cited as a mixin. We won't discuss the current general confusion with mixins under review in Biolink issue https://github.com/biolink/biolink-model/issues/333 and elsewhere, but just note that it is one route which injects ontology class semantics into the attribute class.

If you look at "has attribute type", this latter slot has a range 'ontology class' as well and seems to be a duplication of semantic intent alongside the mixin.

Attribute identification and labelling

When an attribute is to be used in a practical context (e.g. in a knowledge graph KGX file, Neo4j or otherwise), one needs to ask how it is to be distinctly identified and given a (human readable) name.

We note that the parent class abstract entity is just has a description but without slots. The intent of the abstract entity parent is to avoid injection of the full semantics of named thing into attribute. However, this leaves attribute rather bare of any identification or label whatsover.

Of course, ontology class (noted above) may potentially inject an id, name and category into the attribute by way of inheritance from named thing although this is ironically at odds with intent of inheritance from abstract entity.

Other attribute slots

As noted above, the has attribute type slot has overlapping semantics with the ontology class mixin but is, by default, required: false. The mixin is in some sense, mandatory, although its slots (see previous section) seem to also default to required: false.

The two other slots:

   - has quantitative value
   - has qualitative value

seem quite useful as (optional) bindings for qualitative or quantitative attribute values. One could also use the ontology term bound by the has attribute type as a simple boolean assertion of the given term concept as tag.

In comparison, though, the TRAPI specification has a broader data model for 'Attribute' (OpenAPI 3 syntax, not Biolink YAML):

Attribute:
      type: object
      description: Generic attribute for a node
      properties:
        name:
          type: string
          description: >-
            Human-readable name or label for the attribute. Should be the name
            of the semantic type term.
          example: PubMed Identifier
        value:
          example: 32529952
          description: >-
            Value of the attribute. May be any data type, including a list.
        type:
          type: string
          description: >-
            CURIE of the semantic type of the attribute, from the EDAM ontology
            if possible. If a suitable identifier does not exist, enter a
            descriptive phrase here and submit the new type for consideration
            by the appropriate authority.
          example: EDAM:data_1187
        url:
          type: string
          description: >-
            Human-consumable URL to link out and read about the attribute (not
            the node).
          example: https://pubmed.ncbi.nlm.nih.gov/32529952
        source:
          type: string
          description: Source of the attribute, as a CURIE prefix.
          example: UniProtKB
      required:
        - type
        - value
      additionalProperties: false

It may be worth formally comparing and possibly aligning TRAPI with the Biolink Model (or rather, vice versa?).

Use of the attribute class

It seems that all instances of the attribute class are linked upstream to the entity they describe by the slot:

  has attribute:
    description: >-
      connects any named thing to an attribute
    range: attribute
    multivalued: true

Semantically, this is fine, but once again, the question still remains about how an attribute (as discrete elements of annotation on an entity) is to be identified and labelled (see previous sections).

In addition, the class has no domain restriction, thus in principle, it could annotate any class in the model; however, only a single class in the model actually explicitly lists has attribute as a slot, namely:

  material sample:
    is_a: named thing
...
    slots:
      - has attribute

Given that the default status of a defined slot is required: false, would there be any harm in formally listing has attribute as an additional slot of the named thing category definition? Then, it could be removed from material sample but implicitly available to all other categories, for ad hoc node properties.

Note that TRAPI allows (arrays of) Attribute instances for both nodes and edges. This would be comparable to also formally listing 'has attributeas a slot under theassociation` class.

One idea to consider here is to specify a common abstract parent class for both named thing and association (not sure what to call it but) within which to declare some common slots like has attribute (and possibly, id and possibly name) by inheritance something like:

concept or relationship:
   slots:
      - id
      - name
      - has attribute
    slot_usage:
       id:
          required: true

named thing:
    is_a: concept or relationship
    # note that 'id' and 'name' could be removed from 'named thing' 
    # since they would now come in by parental inheritance
    slots:
       - category
    slot_usage:
       name:
         required: true
...

association:
    is_a: concept or relationship
    # name slot isn't mandatory here...
...

An alternate, perhaps better, approach would be to use mixin to achieve the same effect of injecting shared semantics across the various model components discussed above.

  attribute mixin:
    mixin: true
    abstract: true
    slots:
      - has attribute

with

  named thing:
...
    mixins:
      - attribute mixin

and

  association:
...
    mixins:
      - attribute mixin

Whatever approach is used, sharing attributes across NamedThing and Association classes could also facilitate efforts to align the Biolink Model with TRAPI, plus provide a helpful mechanism for more extensive modelling of knowledge graphs with, say, evidence, provenance and confidence annotations.

nlharris commented 3 years ago

Is this also related to https://github.com/NCATSTranslator/ReasonerAPI/issues/192, 'Attribute' schema field definitions and names?

nlharris commented 3 years ago

@RichardBruskiewich is this still relevant? Or maybe done already?