RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 558 forks source link

Are there a better way to do this? #2381

Open miguelsmuller opened 1 year ago

miguelsmuller commented 1 year ago

I have the following code snippet... With it I separate and later put together a ttl again.

graph = Graph()

for file in os.listdir(PATH):
    if file.endswith(".ttl"):
        graph.parse(os.path.join(PATH, file))

graph_class = Graph()
graph_properties = Graph()

graph_class += graph.triples((None, RDF.type, OWL.Class))
graph_properties += graph.triples((None, RDF.type, OWL.ObjectProperty))

graph_full = graph_class + graph_properties

I keep looking at the code and I don't like the aesthetics very much. Is there any other way to do this, that is, build a graph with only the existing classes and properties in the original?

ajnelson-nist commented 1 year ago

Short answer (& my own opinion): What you did is basically fine, but has a few more requirements to elide from the user by call-time, & needs some referential safety checks.

"Class" has a few definitions - rdfs:Class, owl:Class, "~ object of a triple with rdf:type as predicate, or a sub- or superclass thereof" (more specifics available in this SHACL spec section, roughly in a 2-box radius around the blue box for "SHACL Type").

From the OWL perspective, you'd also need to check whether you want to grab IRI-identified classes only, or if you also want the anonymous owl:Classes that are needed for things like equivalency classes and unions (owl:equivalentClass, owl:unionOf). Those are required to be blank nodes; the syntactic requirement is in this document. Also, did you want owl:AnnotationPropertys, because sometimes those could have object ranges? Do you want only the owl:AnnotationPropertys specified in the ontology to have classes as rdfs:ranges?

OWL's well-enough specified that you could write a function similar to your script with a few type checks for URIRef vs BNode. But it does take at least your script, plus a few Python/RDFLib type-safety checks, plus possibly some tuning knobs for whether the caller wants, say, "All classes and predicates that can link class instances," or more.

miguelsmuller commented 1 year ago

I appreciate your insights.

They have indeed proven quite helpful for my requirements. I will revisit this script in the upcoming days and come back with updates.

Nevertheless, it's reassuring to know that the script isn't incorrect.

Thank you very much, @ajnelson-nist and I apologize for the delay in providing feedback.