RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Rdflib 7.0.0: Inadequate Support for Importing Multiple Prefixes with the Same IRI and Base IRI #2768

Open hsekol-hub opened 5 months ago

hsekol-hub commented 5 months ago

The issue is regarding loading .ttl files that contain multiple prefixes defined with the same IRI. This practice doesn't violate any W3C standards and is commonly observed. The simplest illustration of what I'm trying to achieve can be demonstrated with the following code:

from rdflib import Graph, Namespace
EX = Namespace("http://example.org/")
EX1 = Namespace("http://example.org/")
g = Graph(bind_namespaces="none")
g.bind("ex", EX)
g.bind("ex1", EX1)
print(list(g.namespaces()))
[('ex1', rdflib.term.URIRef('http://example.org/'))]

Another issue arises when attempting to read @base in the parsing or Graph() doesn't automatically fetch it from the file:

# Not able to fetch @base
    graph: rdflib.Graph = rdflib.Graph()
    with filepath.open(encoding="utf-8") as file:
        graph.parse(file)

 print(graph.base)  # Outputs `None`

Instead, I have to implement something like this (sub-optimal and questions using rdflib in the first place):

    with open(filepath, encoding="utf-8") as file:
        for line in file:
            if line.strip().startswith("@base :"):
                # Extract the base URI from the line
                base_uri = rdflib.URIRef(
                    line.split(" ")[-2].replace("<", "").replace(">", "")
                )
                break
        graph = rdflib.Graph(base=base_uri)
        graph.parse(file, format="turtle")

These fundamental functionalities lead to inconsistencies in ontology files if read via the rdflib library. Can someone please suggest an alternate library or provide a solution to fix this issue at the earliest convenience?

WhiteGobo commented 5 months ago

As a workaround you can write your second prefix directly into the default memory store.

from rdflib import Graph, Namespace, URIRef
EX = Namespace("http://example.org/")
g = Graph(bind_namespaces="none")
g.bind("ex", EX)
g.store._Memory__namespace["ex1"] = URIRef("http://example.org/")
print(list(g.namespaces()))
#[('ex', rdflib.term.URIRef('http://example.org/')), ('ex1', rdflib.term.URIRef('http://example.org/'))]

I havent tested this any further. So dont know if there will be any problems or other stores will behave in the same manner.

nicholascar commented 4 months ago

The base issue could be a real one and I will look into it when I'm back from holidays in a few weeks.

But the multi prefixes one is not! Sure, it's not a violation to want to have multiple prefixes for the same namespace and you do see it in data but prefixes are just presentation conveniences and I think that too much catering for every possible use of them overemphasises their role. They are not real data things and a single prefix for a namespace will always work, even when multiple are originally supplied in the data, or defined as per your code above.

So I'm not motivated to solve this one. And there is indeed a work around.

If it's really important to you, @hsekol-hub, please feel free to create a Pull Request to address it yourself.