RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.17k stars 555 forks source link

Graph serialize - issues with N3 and TTL #2102

Open davidshumway opened 2 years ago

davidshumway commented 2 years ago

When serializing as either N3 or TTL, empty owl:equivalentClass [ ] and rdfs:subClassOf [ ] statements cause Protégé to fail to load the respective files. Is this an issue with the rdflib serializer, with Protégé, or perhaps with the graph which is being exported (e.g. syntax issues or the way the triples were added to the graph)?

ns5:IDO_0000407 a owl:Class ;
    rdfs:label "primary infectious disposition"@en ;
    ns5:IAO_0000115 "An infectious disposition to become part of a disorder in organisms that have intact defenses."@en ;
    ns5:IAO_0000117 "Albert Goldfain",
        "Alexander Diehl",
        "Lindsay Cowell" ;
    rdfs:comment "A pathogen with a primary infectious disposition can cause disease or death in both immunocompromised and immunocompetent hosts.",
        "A quote from page 3 of Mandell's \"Principles and Practice of Infectious Diseases\" (Sixth edition): \"It is useful to distinguish \"principal\" pathogens, which regularly cause disease in some proportion of susceptible individuals with apparently intact defense systems, from other potentially pathogenic microorganisms.  ... even for most organisms classified as principal pathogens, for example, Staphylococcus aureus and the pneumococcus, some impairment or local breakdown in normal host defense mechanisms must occur for these bacteria to cause disease.  ... Thus, it seems clear that the capacity of certain microorganisms to cause disease in seemingly uncompromised human hosts on a regular basis reflects some fundamental difference in their virulence capabilities from those of opportunists and the more numerous commensal species that rarely, if ever, cause disease.\"" ;
    rdfs:subClassOf ns5:IDO_0000452,
        ns5:IDO_0000596 ;
    owl:equivalentClass [ ] .
ns5:IDO_0000414 a owl:Class ;
    rdfs:label "infectious agent host role"@en ;
    ns5:IAO_0000115 "A pathogen host role borne by an organism in virtue of the fact that its extended organism contains an infectious agent."@en ;
    ns5:IAO_0000117 "Albert Goldfain",
        "Alexander Diehl",
        "Lindsay Cowell" ;
    rdfs:comment "By this definition, vectors and other organisms that may not be infected are bearers of the infectious agent host role." ;
    rdfs:subClassOf [ ],
        ns5:IDO_0000415,
        ns5:IDO_0000531 .
Here are the triples containing equivalentClass as a property: index s p o
0 http://purl\.obolibrary\.org/obo/IDO\_0000623 http://www\.w3\.org/2002/07/owl\#equivalentClass N17bc72fc7cab4a82ba473a9e2b4c10a1
0 http://purl\.obolibrary\.org/obo/IDO\_0000457 http://www\.w3\.org/2002/07/owl\#equivalentClass N721069eeed584ff59b0cb26542ea010c
0 http://www\.ifomis\.org/bfo/1\.1/span\#Occurrent http://www\.w3\.org/2002/07/owl\#equivalentClass n2119fba42b4e47989aab414dc8b6b266b10
0 http://purl\.obolibrary\.org/obo/OBI\_0100026 http://www\.w3\.org/2002/07/owl\#equivalentClass N6db58f733c8a4256bc33a1b54cadccac
0 http://purl\.obolibrary\.org/obo/IDO\_0000531 http://www\.w3\.org/2002/07/owl\#equivalentClass N10f4d581c6ae43e4b41cd25b020705b0
0 http://purl\.obolibrary\.org/obo/IDO\_0000596 http://www\.w3\.org/2002/07/owl\#equivalentClass N523c76d60c4a4ccfb6f645baee40b2ba
0 http://purl\.obolibrary\.org/obo/IDO\_0000528 http://www\.w3\.org/2002/07/owl\#equivalentClass N9918923746894016a9316410f5a26955
0 http://purl\.obolibrary\.org/obo/IDO\_0000585 http://www\.w3\.org/2002/07/owl\#equivalentClass N146e9f40558c485a922fc20c2ae6e25b
0 http://purl\.obolibrary\.org/obo/IDO\_0000625 http://www\.w3\.org/2002/07/owl\#equivalentClass N54cc5a01f92f4519b67138c232d846dd
0 http://purl\.obolibrary\.org/obo/IDO\_0000436 http://www\.w3\.org/2002/07/owl\#equivalentClass Nc2369f01154a4b7e8c9f8c2be345d3d6
0 http://purl\.obolibrary\.org/obo/IDO\_0000621 http://www\.w3\.org/2002/07/owl\#equivalentClass Nde40bb47dd0941b18ef11f6a56d6036b
0 http://purl\.obolibrary\.org/obo/IDO\_0000407 http://www\.w3\.org/2002/07/owl\#equivalentClass N7a283ff6642c409086126d996a25e9ee

There are a number of subClassOf relationships. However, I'm guessing these ones are the culprits(?):

index s p o
0 http://purl\.obolibrary\.org/obo/ExO\_0000050 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N017786b58e584b809c72337c9fbc0f27
0 http://purl\.obolibrary\.org/obo/IDO\_0000420 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N040e6867dff94397bf675f08bf00442b
0 http://purl\.obolibrary\.org/obo/ExO\_0000065 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N110b516948bd4e5b9dce041f82a228b2
0 http://purl\.obolibrary\.org/obo/ExO\_0000001 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N1dc5a5484a2c4efcbe3d1a9cd2161b41
0 http://purl\.obolibrary\.org/obo/IDO\_0000655 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N24091d3dd2e94dcda38efa3f081962c0
0 http://purl\.obolibrary\.org/obo/IDO\_0000619 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N35aaafe748804a3da4fc882d93995683
0 http://purl\.obolibrary\.org/obo/IDO\_0000415 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N3753ade15f9648e7bb1d9b8cb48a9fc3
0 http://purl\.obolibrary\.org/obo/ExO\_0000050 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N39b2f846b6234033a83e244d3c42aaaf
0 http://purl\.obolibrary\.org/obo/ExO\_0000065 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N5a64ae0f3b5d4590b55209a0dca5fec5
0 http://purl\.obolibrary\.org/obo/ExO\_0000065 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N6180793737284439a0d329296b61aa34
0 http://purl\.obolibrary\.org/obo/IDO\_0000513 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N695b886426bc46799805c34fe1f04cf1
0 http://purl\.obolibrary\.org/obo/ExO\_0000004 http://www\.w3\.org/2000/01/rdf-schema\#subClassOf N75cb4fccb36841589db10e967657b549
ajnelson-nist commented 2 years ago

I suspect the underlying issue here is tool-agnostic. Your input graphs do not seem to be conformant with OWL 2 DL, and you may be running into an issue where a tool has a prerequisite for processing data that it be OWL 2 DL.

I've been informed that the OWL 2 Mapping to RDF document has a strict requirement of pattern-matching for encountered RDF patterns. (I was kindly informed of this by one of the document authors.) The tables throughout Section 3 need to be followed exactly, and if anything is "left over" after being processed by the patterns in those tables, then the document is not in OWL 2 DL but instead in OWL 2 FULL. I've seen Protege balk at OWL 2 FULL documents in various manners. (For instance, typo-ing owl:AnnotatedTarget instead of owl:annotatedTarget gives a particularly cryptic message at load-time.) In the face of OWL 2 FULL being arbitrary RDF, it's understandable for a tool to just give up on some load operations.

With your example input, particular the excerpted triple ns5:IDO_0000407 owl:equivalentClass [ ] ., you are presenting an OWL 2 FULL graph (arbitrary RDF), not DL. Explanation, referring to the linked mapping document:

So, once the OWL engine has done all it can by strict matching from Section 3, ns5:IDO_0000407 owl:equivalentClass [ ] . will be a leftover triple in the graph. The very last line of text before Section 4, "At the end of this process, the graph G must be empty," and the process not consuming the whole graph, means this input is not an OWL 2 DL conformant graph.

If you delete the triples with those empty blank nodes, you'll be a step closer to OWL 2 DL conformance, and Protege might give you a different load-time behavior.