biolink / biolink-model

Schema and generated objects for biolink data model and upper ontology
https://biolink.github.io/biolink-model/
Other
170 stars 71 forks source link

is_a vs. subClass issues #182

Closed hsolbrig closed 3 years ago

hsolbrig commented 5 years ago

The Biolink model asserts:

drug is_a chemical substance
chemical substance is_a molecular entity
molecular entity is_a biological entity

This represented as:

biolink:Drug a biolink:ChemicalSubstance . 
biolink:ChemicalSubstance a biolink:MolecularEntity .
biolink:MolecularEntity a biolink:BiologicalEntity .

in Biolink RDF.

rdf:type is not transitive, meaning that an association whose subject is a Drug is not a valid instance of ChemicalSubstanceToThing association.

Have we conflated 'type' (is_a) with 'subClassOf'

cmungall commented 5 years ago

oh, that's just a bug - is_a should be mapped to rdfs:subClassOf

hsolbrig commented 5 years ago

Probably my doing. is_a is used both in class-class and slot-slot (and, at the moment, class-slot is also allowed) associations. I'm inclined to not try to get too clever and, instead, define is_a for rdfs:subClassOf and is_a_p (?) for rdfs:subPropertyOf. What do you think?

hsolbrig commented 5 years ago

Actually, it still leaves us with a problem. We have three uses of is_a:

  1. class-class: "BioSample": { "is_a": "BiologicalEntity"}
  2. slot-slot: "interacts with": { "is_a": "related_to"}
  3. instance-class: `"_:X" {"is_a": "ChemicalToGeneAssociation"}

The first entry maps to rdfs:subClassOf, the second to rdfs:subPropertyOf and the third to rdf:type (???)

I would propose is_a for class-class, subPropertyOf for slot-slot and a for instance-class (assuming that we ARE talking instance-class in the third case.

cmungall commented 5 years ago

On 17 Dec 2018, at 9:11, Harold Solbrig wrote:

Actually, it still leaves us with a problem. We have three uses of is_a:

  1. class-class: "BioSample": { "is_a": "BiologicalEntity"}
  2. slot-slot: "interacts with": { "is_a": "related_to"}
  3. instance-class: `"_:X" {"is_a": "ChemicalToGeneAssociation"}

The first entry maps to rdfs:subClassOf, the second to rdfs:subPropertyOf

yep. We can use a more explicit slot in the metamodel if it helps

and the third to rdf:type (???)

hmm, do we have any instances in the model?

I would propose is_a for class-class, subPropertyOf for slot-slot and a for instance-class (assuming that we ARE talking instance-class in the third case.

-- You are receiving this because you commented. Reply to this email directly or view it on GitHub: https://github.com/biolink/biolink-model/issues/182#issuecomment-447922679

hsolbrig commented 5 years ago

Not in the model itself, but when the model is used. How do I represent the assertion:

drugbank:DB11827 bioentity:affects_transport_of drugbank:BE0000530. as asserted in pubmed:29042751? Currently we're proposing:

"foo" {"is_a": "ChemicalToGeneAssociation",
          "subject": "http://identifiers.org/drugbank/DB11827",
          "relation": "affects_transport_of",
          "object": ""http://identifiers.org/drugbank/BE0000530",
          "publications: ["http://identifiers.org/drugbank/BE0000530"]
}

Assuming that our intent is to say the "foo" is an instance of the class ChemicalToGeneAssociation (which I believe to be the case), perhaps we should add "a" to the metamodel?

cmungall commented 5 years ago

foo would be an instance of ChemicalToGeneAssociation, but otherwise the same.

Note that from an OBO modeling POV we may model specific drugs as classes, but we induce the IRIs to be punned here

cmungall commented 5 years ago

So if we add a new relation into biolink-model called type (Though a is cute I think type is more explicit), that is equivalent to rdf:rtype, does that solve things?

hsolbrig commented 5 years ago

Actually, I realized that we had a wee bit of problem crossing the "meta-beams". The way to declare instances is through the slot domain and range attributes. Starting with the meta-model, which is self defining, we get:

metamodel:
    # Model type *instances*
    types:
        uri:
    # Model slot *instances*
    slots:      
       typeof:
          domain: type definition
          range: type definition
       uri:
           domain: type definition
           range: uri
       domain:
          domain: slot definition
          range: class definition
       range:
          domain: slot definition
          range: definition                # type definition or class definition

       # Model *definition* of types  
       types:
          domain: schema definition
          range: type definition
          inlined: true
          multivalued: true

       # Model *definition* of slots
       slots:
         domain: schema definition
         range: slot definition
         inlined: true
         alias: slots

      # Model *definition* of classes
      classes:
         domain: schema definition
         range: class definition
         multivalued: true
         inlined: true

  # Model class *instances*
  classes:
     slot definition:
     class definition:
     schema definition:

In the meta world this is a bit confusing, as we are simultaneously defining "slots", "classes", and using said definitions in the definition of the schema meta. The key, however, is once we've explicitly stated that the above model is an instance of schema definition, we know the type of everything else.

Now on to the model world. For the sake of argument, we simplify:

model:

   slots:
       root:
           domain: type constraint
           range: uri
       union of:
            domain: type constraint
            range: type constraint
            multivalued: true
       subject:
           domain: association
           range: type constraint
           required: true
       relation:
           domain: association
           range: type constraint
           required: true
       object:
           domain: association
           range: type constraint
           required: true
       reified stuff:
           domain: association
           range: string
           multivalued: true
           description: all the other stuff we tack onto a reified association
      is_a:
          domain: association
          range: association
          description: an association/association refinement

       constraints:
           domain: model instance
           range: type constraint
           multivalued: true
           inlined: true

       associations:
          domain: model instance
          range: association
          multivalued: true
          inlined: true

   classes:
        type constraint:
        association:
        model instance:

As with the metamodel above, once we assert that any knowledge graph is an instance of model instance, all the rest of the information just falls out. We crossed the meta-boundaries when we assert the following:

   genotype to genotype part association:
      is_a: association
      ...

My take is that this is not correct. Instead, we need to assert:

core kg:
    type constraints:
        gene:
           root: SO:0000704
        gene product:
           root: WD:Q424689
        gene or gene product:
           union of:
              - gene
              - gene product
        homologous to:
             root: RO:HOM0000001

   associations:
       # This is where you establish the basic model assumptions.  Some examples
       semiformal association:
           subject: RDF.Resource
           predicate: RDF.property
           object: RDF.Resource

       formal association:
            subject: OWL.Class
            predicate: OWL.ObjectProperty
            object: OWL.Class

       gene to gene association:
           is_a: formal association.                # Note that this is NOT the metamodel class/class is_a!
           subject: gene or gene product
           object: gene or gene product
       gene to gene homology association:
           is_a: gene to gene association     
           relation: homologous to

NOTE: the above is simply an approximation of what we are trying to do. If done correctly, I believe that it will do minimal damage to the model as we have it today.

The key to the above approach is that association and type constraint are models and gene to gene product is an instance.

(I am about to submit a new biolink-metamodel project to the repository to clearly separate the metamodel from the biolink-model level and to clean/crisp some things up.)

cmungall commented 5 years ago

I'm not totally following - is there a way to frame this problem without getting into issues of reification or is it tied together?

fractaler commented 5 years ago

Sorry, I have long wanted to know why everyone uses "class"/"instance" instead of "set"/"element"?

hsolbrig commented 5 years ago

@fractaler - I think it has to do with keeping the model separate from the things being modeled. A component of a class definition might, for example, include an element named "age" whose value is an integer in the set 0 to 130. While, strictly speaking, it is sets and elements all the way up, it gets confusing to use "set definition", "set" and "element" and, "member" throughout, so we tend to use "class" to define composite sets and "instance" to identify a collection of elements that satisfy the class definition. If you write it out in Z, it all reduces to sets, relations and elements.

cmungall commented 5 years ago

Sorry, I have long wanted to know why everyone uses "classe"/"instance" instead of "set https://www.wikidata.org/wiki/Q36161"/"element https://www.wikidata.org/wiki/Q379825"?

class has been used in OO languages since the earliest days. As far as OWL/ontologies goes formally classes are extensions of sets, not sets, but in practical terms it helps to think in terms of sets/elements

fractaler commented 5 years ago

@hsolbrig, @cmungall, Thank you for responses! As I understand it, everyone uses the terms they are used to. For example, some use the term "class":

Class "0-year-olds", class "1-year-olds", ... class "130-year-olds".

And others are used to the term set: Set "0-year-olds", set "1-year-olds", ... set "130-year-olds".

kshefchek commented 3 years ago

was this resolved or was the ticket moved/replaced?

cmungall commented 3 years ago

Was there ever an issue to be resolved? tbh I am not fully following the threads in this ticket. I don't even know if this is a biolink question. Maybe open a new ticket if there is something you need addressed?

kshefchek commented 3 years ago

it looks like theres still both is_a and subclass_of in linkml. Should subclass_of be removed to avoid confusion?

Also, I'm guessing that instance-class is out of scope but it's not clear from this ticket what the final decision is.

cmungall commented 3 years ago

ah, I forgot about that. linkml:subclass_of is no longer used, I made a ticket to deprecate https://github.com/linkml/linkml/issues/243

is_a is used as the metamodeling construct

so I think the question is moot for biolink, and we can close this?