RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

`rdfpipe` and binding XSD prefix #2802

Open ajnelson-nist opened 3 months ago

ajnelson-nist commented 3 months ago

This post is a question on flag intent for rdfipipe --ns=.... It might also be a question about RDF-XML syntax.

I have several workflow streams that eventually flow pooled graph files into one graph file. Some of the steps use rdfpipe[^1] to concatenate some files together or just format-convert. From time to time, I experience some churn with the XSD prefix being xs: in some files and xsd: in others, and this causes the prefix to change somewhat randomly in the file at the end of the workflow when I incorporate new graph files.

Suppose I have an input file that defines a prefix for the XML Schema Datatypes IRI, using xsd, though nothing in the graph actually uses that prefix. The input file is as follows, and is listed here in case I am misunderstanding the role XML namespace declarations are supposed to have with prefixes.

<?xml version="1.0" encoding="UTF-8"?>

<rdf:RDF
    xmlns:ex="http://example.org/ontology/example/"
    xmlns:owl="http://www.w3.org/2002/07/owl#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#">

    <owl:Ontology rdf:about="http://example.org/ontology/example">
        <owl:versionIRI rdf:resource="http://example.org/ontology/example/0.0.1"/>
    </owl:Ontology>

    <owl:Class rdf:about="http://example.org/ontology/example/Object">
        <rdfs:subClassOf rdf:resource="http://www.w3.org/2002/07/owl#Thing"/>
    </owl:Class>

</rdf:RDF>

First, an RDF-XML specific question: Are XML entity declarations required in order to consider xsd: a prefix name? Apologies, I'm having some trouble finding this in the RDF specs. I've at least found RDF 1.1 XML Syntax Section 5.2.

Next, the rdfpipe question: I'm not sure how to get rdfpipe to carry that xsd prefix(?) forward, or if I should expect to be able to, if the prefix isn't used in triples.

This command applies a new namespace prefix ex1 in the generated graph, whether or not I have the xmlns:ex declaration in the XML:

rdfpipe --output-format turtle --ns='ex1=http://example.org/ontology/example/' input.xml > output.ttl

Without --ns, the xsd prefix does not get added to rdfpipe's output---which I can understand, since the prefix isn't used in any of the axioms. But, even adding an --ns for the XML Schema Datatypes IRI, the prefix doesn't get emitted in the generated Turtle graph. I see some interpretations where this is the intended behavior, but I'm wondering if this was actually intended. Should I be able to use --ns to load in as many namespaces as I want, whether or not they're in the input graph?

[^1]: Disclaimer: Participation by NIST in the creation of the documentation of mentioned software is not intended to imply a recommendation or endorsement by the National Institute of Standards and Technology, nor is it intended to imply that any specific software is necessarily the best available for the purpose.