cancerDHC / ccdhmodel

CRDC-H model in LinkML, developed by the Center for Cancer Data Harmonization (CCDH)
https://cancerdhc.github.io/ccdhmodel/
BSD 3-Clause "New" or "Revised" License
16 stars 8 forks source link

problem with generated OWL file #64

Open balhoff opened 3 years ago

balhoff commented 3 years ago

The OWL API parser complains when trying to read ccdhmodel.owl.ttl:

jim (main)$ robot convert -i owl/ccdhmodel.owl.ttl -o ccdhmodel.ofn
2021-07-09 22:17:39,644 ERROR org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer - Entity not properly recognized, missing triples in input? http://org.semanticweb.owlapi/error#Error1 for type Class
2021-07-09 22:17:39,652 ERROR org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer - Entity not properly recognized, missing triples in input? http://org.semanticweb.owlapi/error#Error2 for type Class
2021-07-09 22:17:39,652 ERROR org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer - Entity not properly recognized, missing triples in input? http://org.semanticweb.owlapi/error#Error3 for type Class
2021-07-09 22:17:39,652 ERROR org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer - Entity not properly recognized, missing triples in input? http://org.semanticweb.owlapi/error#Error4 for type Class
2021-07-09 22:17:39,654 ERROR org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer - Entity not properly recognized, missing triples in input? http://org.semanticweb.owlapi/error#Error5 for type Class
2021-07-09 22:17:39,654 ERROR org.semanticweb.owlapi.rdf.rdfxml.parser.OWLRDFConsumer - Entity not properly recognized, missing triples in input? http://org.semanticweb.owlapi/error#Error6 for type Class
# many more lines...

I thought this might be a linkml issue, but I tried the biolink-model OWL file and didn't see this problem.

balhoff commented 3 years ago

I think the trouble is that there are many undeclared properties used as triple objects for owl:onProperty. Example:

<https://example.org/ccdh/CodeableConcept> a owl:Class,
        linkml:ClassDefinition ;
    rdfs:label "CodeableConcept" ;
    rdfs:subClassOf [ a owl:Restriction ;
            owl:maxQualifiedCardinality 1 ;
            owl:onClass <https://example.org/ccdh/CcdhString> ;
            owl:onProperty linkml:text ],
        [ a owl:Restriction ;
            owl:allValuesFrom <https://example.org/ccdh/Coding> ;
            owl:onProperty linkml:coding ],
        <https://example.org/ccdh/Entity> ;
    skos:definition "A representation of a concept that may be defined by or mapped to one or more codes in code systems (terminologies, ontologies, dictionaries, code sets, etc) - but may also be defined by the provision of text." ;
    skos:editorialNote "Derived from [CodeableConcept in sheet CodeableConcept](https://docs.google.com/spreadsheets/d/1oWS7cao-fgz2MKWtyr8h2dEL9unX__0bJrWKv6mQmM4/edit#gid=1820375300)" ;
    skos:note "More than one code may be used in CodeableConcept. The concept may be coded multiple times in different code systems (or even multiple times in the same code systems, where multiple forms are possible). Each Coding is a representation of the concept as described above and may have slightly different granularity due to the differences in the definitions of the underlying codes. There is no meaning associated with the ordering of Coding within a CodeableConcept. A typical use of CodeableConcept is to send the local code that the concept was coded with, and also one or more translations to publicly defined code systems such as LOINC or SNOMED CT. " .

Neither linkml:text nor linkml:coding have declared types.

gaurav commented 3 years ago

Interesting!

Neither linkml:text nor linkml:coding have declared types.

That's because both of those properties are CRDCH properties, not LinkML properties! So they should be ccdh:text and ccdh:coding.

I bet this is a consequence of us using slots defined on properties, so e.g. the coding field uses a CURIE of ccdh:codeableConcept__coding, which ought to be defined in the OWL file somewhere.

We should probably come up with a minimum test case for this and then report it to LinkML.

hsolbrig commented 3 years ago

FWIW, the (latest) FHIR definitions of those classes can be found in https://build.fhir.org/fhir.ttl :

fhir:CodeableConcept  a  owl:Class ;
        rdfs:comment     "A concept that may be defined by a formal reference to a terminology or ontology or may be provided by text." ;
        rdfs:label       "CodeableConcept" ;
        rdfs:subClassOf  fhir:DataType ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:allValuesFrom   fhir:string ;
                           owl:maxCardinality  1 ;
                           owl:onProperty      fhir:CodeableConcept.text
                         ] ;
        rdfs:subClassOf  [ a                  owl:Restriction ;
                           owl:allValuesFrom  fhir:Coding ;
                           owl:onProperty     fhir:CodeableConcept.coding
                         ] ;
        dc:title         "Concept - reference to a terminology or just  text" .

fhir:Coding  a           owl:Class ;
        rdfs:comment     "A reference to a code defined by a terminology system." ;
        rdfs:label       "Coding" ;
        rdfs:subClassOf  fhir:DataType ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:allValuesFrom   fhir:uri ;
                           owl:maxCardinality  1 ;
                           owl:onProperty      fhir:Coding.system
                         ] ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:allValuesFrom   fhir:boolean ;
                           owl:maxCardinality  1 ;
                           owl:onProperty      fhir:Coding.userSelected
                         ] ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:allValuesFrom   fhir:string ;
                           owl:maxCardinality  1 ;
                           owl:onProperty      fhir:Coding.version
                         ] ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:allValuesFrom   fhir:code ;
                           owl:maxCardinality  1 ;
                           owl:onProperty      fhir:Coding.code
                         ] ;
        rdfs:subClassOf  [ a                   owl:Restriction ;
                           owl:allValuesFrom   fhir:string ;
                           owl:maxCardinality  1 ;
                           owl:onProperty      fhir:Coding.display
                         ] ;
        dc:title         "A reference to a code defined by a terminology system" .

and

fhir:CodeableConcept.text
        a             owl:ObjectProperty ;
        rdfs:comment  "A human language representation of the concept as seen/selected/uttered by the user who entered the data and/or which represents the intended meaning of the user." ;
        rdfs:domain   fhir:CodeableConcept ;
        rdfs:label    "CodeableConcept.text" ;
        rdfs:range    fhir:string ;
        dc:title      "Plain text representation of the concept" .

I'm not certain that I understand why we are trying to replicate this -- shouldn't we consider importing fhir.ttl instead? Notes: 1) While CodeableConcept and its friends are reasonably stable, they may still change on occassion. 2) The FHIR RDF specification itself may change in the upcoming year. In particular, "CodeableConcept.text" and "fhir:boolean", etc. may be simplified. 3) There is JSON-LD context available for these items, meaning that we don't have to match them verbatim if we need them in our own model. At the moment it can be found at:

https://fhircat.org/fhir-r4/original/contexts/codeableconcept.context.jsonld

gaurav commented 3 years ago

It looks like there are two problems here:

  1. Every onProperty statement uses linkml: as a prefix, when it should use ccdh:.
  2. The text is defining ccdh_string as an object, when it should be defined (and used) as a datatype: https://cancerdhc.github.io/ccdhmodel/v1.0.1/types/CcdhString/ -- maybe if we explicitly map this to xsd:string that will go away?
balhoff commented 3 years ago

Just a note - I think we determined that an instance of CcdhString would be an object with a value data property pointing to an xsd:string.

gaurav commented 2 years ago

I don't think there's anything here that would be helpful for the CCDH Pilot, so I'm inclined to push it to a later milestone. Any objections?

I suspect we might be able to fix some of these issues by setting a default_prefix, which is what the LinkML example file does.

However, this is at least one OWL error in the current example test file, so there are definitely OWL generation issues in LinkML that need to be fixed. @balhoff Would you like to take a stab at this? It isn't very urgent, but I think you have the most OWL experience of anyone of the Tools team at present.

balhoff commented 2 years ago

A fix has been implemented in linkml; we should test with the next linkml release.

gaurav commented 2 years ago

@fragosog mentioned that he would like to visualize the CRDC-H model using Protege, and fixing this issue would allow him do that.