geneontology / obographs

Basic and Advanced OBO Graphs: specification and reference implementation
64 stars 12 forks source link

Class declarations w/out further assertions not in serialization #90

Closed joeflack4 closed 1 year ago

joeflack4 commented 1 year ago

Overview

Context I opened an issue in robot, and @matentzn thought I should mirror here (CC @julesjacobsen ).


My team is doing some conversions from OWL -> Obographs JSON -> FHIR JSON, and we noticed that some concepts were missing from the output.

I took a look, and these are all root nodes; they are all rdfs:subClassOf owl:Thing.

Edges that reference these nodes exist, but the declaration of the nodes themselves do not. It's not just a FHIR JSON issue. I looked in my Obographs JSON (downloadable here), and the declarations are missing there as well.

I imagine this is an issue for the other ontologies I'm working with, but for this particular ontology, comploinc.owl, these are the declarations of the root nodes that are missing from the Obographs JSON:

    <owl:Class rdf:about="https://loinc.org/LP70625-6A">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>
    <owl:Class rdf:about="https://loinc.org/lc0000001">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>
    <owl:Class rdf:about="https://loinc.org/LP33117-0">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>
    <owl:Class rdf:about="https://loinc.org/LP33103-0">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>

Expected vs Actual

Expected I would expect to see something like this in my Obographs JSON:

{
  "id": "https://loinc.org/LP33117-0",
  "type": "CLASS"
}

Or maybe something like this, with one or more nulls for missing properties:

{
  "id": "https://loinc.org/LP33117-0",
  "type": "CLASS",
  "lbl": null,
  "meta": {
    "definition": null,
    ...
  }
}

Actual No declarations appear.

Reproducibility

  1. Download merged_reasoned_loinc.owl
  2. Download robot 1.9.1 if not already present
  3. Run : `java -jar robot convert -i path/to/merged_reasoned_loinc.owl -o path/to/outfile.json --format json'

What I tried

I examined the CLI to see if there was an option dealing with root nodes, but I didn't see anything in java -jar bin/robot.jar convert -h that seems like it could help with this.

Additional information

In this example, only root nodes (subClassOf owl:Thing) were missing, but in some other ontologies I looked at, there may have been other cases missing. Will update if I have more time to get concrete examples.

Related: https://github.com/ontodev/robot/issues/1082

dosumis commented 1 year ago

This is a potentially an issue for my group too. We have been encouraging use of OBOgraphs JSON over OBO as part of external collaborations. We need to be aware of any cases where terms will be lost.

Quick question: Is this affecting only bare declarations? Does adding at least one annotation property axiom fix it?

matentzn commented 1 year ago

a second small test ontology

Prefix(:=<http://www.semanticweb.org/matentzn/ontologies/2021/11/untitled-ontology-544#>)
Prefix(owl:=<http://www.w3.org/2002/07/owl#>)
Prefix(rdf:=<http://www.w3.org/1999/02/22-rdf-syntax-ns#>)
Prefix(xml:=<http://www.w3.org/XML/1998/namespace>)
Prefix(xsd:=<http://www.w3.org/2001/XMLSchema#>)
Prefix(rdfs:=<http://www.w3.org/2000/01/rdf-schema#>)

Ontology(<http://www.semanticweb.org/matentzn/ontologies/2021/11/untitled-ontology-544>

Declaration(Class(<http://purl.obolibrary.org/obo/upheno/workshop2021/TMP_0000060>))
)

Pipeline:

robot convert -i single_declaration.owl -f json -o out.json

Results in:

{
  "graphs" : [ {
    "id" : "http://www.semanticweb.org/matentzn/ontologies/2021/11/untitled-ontology-544"
  } ]
}

@cmungall can you confirm that the output should be:

{
  "graphs" : [ {
    "id" : "http://www.semanticweb.org/matentzn/ontologies/2021/11/untitled-ontology-544",
    "nodes" : [ {
      "id" : "http://purl.obolibrary.org/obo/upheno/workshop2021/TMP_0000060",
      "type" : "CLASS"
    } ]
  } ]
}
matentzn commented 1 year ago

I think the issue is roughly around this ballpark:

Here we check something about Declaration axioms:

https://github.com/geneontology/obographs/blob/master/obographs-owlapi/src/main/java/org/geneontology/obographs/owlapi/FromOwl.java#L100

Maybe its sufficient when you encounter declaration axioms to do:

String subj = getIndividualId(e);
nodeIds.add(subj);

?

cmungall commented 1 year ago

Yes, I confirm that a node without lbl should be made. Representation should be isomorphic to standard owl serializations as far as possible, fewest surprises. If groups want to filter danglings post-hoc they can do this

julesjacobsen commented 1 year ago

Working on this now. @matentzn's case is sorted, but this leads to lots (all?) of the example cases breaking due to new un-labeled nodes being included, and a more consistently ordered by type, which I suppose is a good thing. For example

BEFORE:

---
graphs:
- id: "http://purl.obolibrary.org/obo/test.owl"
  meta:
    basicPropertyValues:
    - pred: "http://www.w3.org/2000/01/rdf-schema#comment"
      val: "test manus ontology"
  nodes:
  - id: "http://purl.obolibrary.org/obo/BFO_0000050"
    type: "PROPERTY"
    meta:
      xrefs:
      - val: "BFO:0000050"
      basicPropertyValues:
      - pred: "http://www.geneontology.org/formats/oboInOwl#shorthand"
        val: "part_of"
  - id: "http://purl.obolibrary.org/obo/IAO_0000115"
    lbl: "definition"
    type: "PROPERTY"
  - id: "http://purl.obolibrary.org/obo/UBERON_0002101"
    lbl: "limb"
    type: "CLASS"
  - id: "http://purl.obolibrary.org/obo/UBERON_0002102"
    lbl: "forelimb"
    type: "CLASS"
  - id: "http://purl.obolibrary.org/obo/UBERON_0002398"
    lbl: "manus"
    type: "CLASS"
    meta:
      definition:
        val: "."
  - id: "http://purl.obolibrary.org/obo/UBERON_0002470"
    lbl: "autopod region"
    type: "CLASS"
  - id: "http://www.geneontology.org/formats/oboInOwl#hasDbXref"
    lbl: "database_cross_reference"
    type: "PROPERTY"
  - id: "http://www.geneontology.org/formats/oboInOwl#shorthand"
    lbl: "shorthand"
    type: "PROPERTY"
  edges:
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002102"
    pred: "is_a"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002101"
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002398"
    pred: "is_a"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002470"
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002398"
    pred: "http://purl.obolibrary.org/obo/BFO_0000050"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002102"
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002470"
    pred: "http://purl.obolibrary.org/obo/BFO_0000050"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002101"

AFTER:

---
graphs:
- id: "http://purl.obolibrary.org/obo/test.owl"
  meta:
    basicPropertyValues:
    - pred: "http://www.w3.org/2000/01/rdf-schema#comment"
      val: "test manus ontology"
  nodes:
  - id: "http://purl.obolibrary.org/obo/UBERON_0002101"
    lbl: "limb"
    type: "CLASS"
  - id: "http://purl.obolibrary.org/obo/UBERON_0002102"
    lbl: "forelimb"
    type: "CLASS"
  - id: "http://purl.obolibrary.org/obo/UBERON_0002398"
    lbl: "manus"
    type: "CLASS"
    meta:
      definition:
        val: "."
  - id: "http://purl.obolibrary.org/obo/UBERON_0002470"
    lbl: "autopod region"
    type: "CLASS"
  - id: "http://purl.obolibrary.org/obo/BFO_0000050"
    type: "PROPERTY"
    meta:
      xrefs:
      - val: "BFO:0000050"
      basicPropertyValues:
      - pred: "http://www.geneontology.org/formats/oboInOwl#shorthand"
        val: "part_of"
  - id: "http://purl.obolibrary.org/obo/IAO_0000115"
    lbl: "definition"
    type: "PROPERTY"
  - id: "http://www.geneontology.org/formats/oboInOwl#hasDbXref"
    lbl: "database_cross_reference"
    type: "PROPERTY"
  - id: "http://www.geneontology.org/formats/oboInOwl#id"
    type: "PROPERTY"
  - id: "http://www.geneontology.org/formats/oboInOwl#shorthand"
    lbl: "shorthand"
    type: "PROPERTY"
  - id: "http://www.w3.org/2000/01/rdf-schema#label"
    type: "PROPERTY"
  edges:
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002102"
    pred: "is_a"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002101"
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002398"
    pred: "is_a"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002470"
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002398"
    pred: "http://purl.obolibrary.org/obo/BFO_0000050"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002102"
  - sub: "http://purl.obolibrary.org/obo/UBERON_0002470"
    pred: "http://purl.obolibrary.org/obo/BFO_0000050"
    obj: "http://purl.obolibrary.org/obo/UBERON_0002101"

note that all the OwlDeclarationAxioms are grouped and listed by node with type CLASS, PROPERTY, INDIVIDUAL. If you'd prefer a different order, say so now.

AFTER, contains the following additional unlabeled PROPERTY axioms:

  - id: "http://www.geneontology.org/formats/oboInOwl#id"
    type: "PROPERTY"
...
  - id: "http://www.w3.org/2000/01/rdf-schema#label"
    type: "PROPERTY"
julesjacobsen commented 1 year ago

@joeflack4 do these files match your expectations? merged-reasoned-loinc-obographs-0.3.1.zip

joeflack4 commented 1 year ago

@julesjacobsen This looks great to me! @matentzn Whenever this is done, I just need to get this update into robot as well. Let me know if you think I should make an issue there. @cthoyt And when that happens, if bioontologies can also be updated, I can switch my package(s) to use that instead of robot directly. FYI this is concerning an Obographs fix where some term references would appear as a subject or object in "edges", but the class declaration for the term would not appear in "nodes".

matentzn commented 1 year ago

This does look great! I will take a closer look when I see the PR with the example cases fixed..

cthoyt commented 1 year ago

@joeflack4 I don't think this will make a difference on the bioontologies implementation. If you upgrade your ROBOT, then it should work!

joeflack4 commented 1 year ago

@cthoyt Oh of course, I forgot, Charlie, that bioontologies pulls it from the PATH.

julesjacobsen commented 1 year ago

@joeflack4 This is part of the new 0.3.1 release

joeflack4 commented 1 year ago

Thanks so much!