RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Graph parse method overrides prefix bindings #1997

Open edmondchuc opened 2 years ago

edmondchuc commented 2 years ago

If I run the following code:

from rdflib import Graph, Namespace

data = """
PREFIX dcterms: <http://purl.org/dc/terms/>
PREFIX ns: <https://example.com/>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX schema: <https://schema.org/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

    a skos:Concept .

EX = Namespace("https://example.com/")

graph = Graph()

graph.bind("ex", EX)
graph.parse(data=data, format="turtle")


It will print:

@prefix ns: <https://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

ns:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept ;
    skos:definition "definition" ;
    skos:prefLabel "label" .

I would have expected the bind to persist through the life of the graph object and print the following result:

@prefix ex: <https://example.com/> .
@prefix skos: <http://www.w3.org/2004/02/skos/core#> .

ex:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept ;
    skos:definition "definition" ;
    skos:prefLabel "label" .

If I swap the two lines from:

graph.bind("ex", EX)
graph.parse(data=data, format="turtle")


graph.parse(data=data, format="turtle")
graph.bind("ex", EX)

it then prints what I expect.

Is this the expected behaviour where calling the parse() method overwrites prefix bindings in the graph's namespace manager?

ghost commented 2 years ago

Is this the expected behaviour where calling the parse() method overwrites prefix bindings in the graph's namespace manager?

It is according to my discoveries when working through testing the override/replace interactions and I included an observation on the matter:


which is intended to find its way into the documentation.

I have a sense that the domain modelling (of the source as a serialized RDF Graph) is slightly more faithfully represented in the traditional RDFLib invocation idiom:

g = Graph().parse(data=source, format=format)

There doesn't seem to be any elegant and straightforward way of handling prefix-namespace bindings in a format-independent manner. The turtle parser doesn't use either override or replace: https://github.com/RDFLib/rdflib/blob/05dced203f7db28470255ce847db6b38d05a2663/rdflib/plugins/parsers/notation3.py#L1946

Also consider:

def test_parse_namespace():
    data = """
    PREFIX dcterms: <http://purl.org/dc/terms/>
    PREFIX ns: <https://example.com/>
    PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
    PREFIX schema: <https://schema.org/>
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>

        a skos:Concept .

    EX = Namespace("https://example.com/")

    graph = Graph()

    graph.parse(data=data, format="turtle")
    graph.bind("ex", EX)

    assert graph.serialize(format="turtle") == (
        "@prefix ex: <https://example.com/> .\n"
        "@prefix skos: <http://www.w3.org/2004/02/skos/core#> .\n"
        "ex:08429fce-4d70-4be4-9c64-ffc80f554ea7 a skos:Concept .\n"

    graph2 = Graph()
    graph2 += graph  # Namespace bindings in graph not preserved

    assert graph2.serialize(format="turtle") == (
        "<https://example.com/08429fce-4d70-4be4-9c64-ffc80f554ea7> a "
        "<http://www.w3.org/2004/02/skos/core#Concept> .\n"

    graph2 = Graph()
    graph2.bind("xe", EX)

    graph2 += graph  # Namespace bindings in graph2 preserved

    assert graph2.serialize(format="turtle") == (
        "@prefix xe: <https://example.com/> .\n"
        "xe:08429fce-4d70-4be4-9c64-ffc80f554ea7 a "
        "<http://www.w3.org/2004/02/skos/core#Concept> .\n"
aucampia commented 1 year ago

I think your expectaion is reasonable @edmondchuc - but changing this will be a breaking change, so should be targeted for 7.x, see https://github.com/RDFLib/rdflib/pull/2108 for some options.