RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.18k stars 558 forks source link

rdflib 6.2.0 SPARQL parsing regression: multiple prefixes for same IRI not supported? #2077

Open jclerman opened 2 years ago

jclerman commented 2 years ago

I have a standard block of SPARQL prefixes that I use in some of my code, and I started getting test failures when I tried to update from rdflib 6.1.1 to rdflib 6.2.0. I noticed that the problem seems to be that somehow, support for multiple prefixes that resolve to the same URI seems to have been dropped.

Example block at the beginning of a SPARQL query:

PREFIX foo: <http://example.com#>
PREFIX bar: <http://example.com#>

The above should, as far as I know, be supported - but in rdflib 6.2.0, the foo: prefix will not be recognized when used in the query - and in fact, during parsing of the query, the foo: prefix is deleted.

The issue seems to be somewhere in rdfllib.namespace.__init__.py, in NamespaceManager.bind() - but I can't quite pin down what change is causing this new issue.

I do see this comment in the code, which was present in 6.1.1 but maybe it wasn't working as described?

        # Check if the bound_namespace contains a URI
        # and if so convert it into a URIRef for comparison
        # This is to prevent duplicate namespaces with the
        # same URI

however, if I correctly understand the intent behind that comment, it seems problematic - there is nothing wrong (as far as I know?) with multiple prefixes for the same URI in SPARQL.

aucampia commented 2 years ago

however, if I correctly understand the intent behind that comment, it seems problematic - there is nothing wrong (as far as I know?) with multiple prefixes for the same URI in SPARQL.

I agree with you that I think it is problematic, I don't really see anything wrong with multiple prefixes either, maybe one concern is that we could end up with ambiguity in what the output of serializers that support namespaces will be, but I don't really think that is reason enough to not support multiple prefixes mapping to the same URI.

aucampia commented 2 years ago

Sorry for the accidental closure.

jclerman commented 1 year ago

I've looked into this further, motivated by the issues I described in #2348.

The root of the problem is at least reflected in the fact that the Store class (in rdflib/store.py) defines the prefix method with the expectation that it'll return an Optional[str]. I think instead it should be returning a Set[str] (which might be empty).

All of the implementations of Store would need to be updated with that in mind, including updates of their implementations of bind(). For example, in the Memory store, we have

https://github.com/RDFLib/rdflib/blob/cbd61510ec581aee262bc2bb8ad95d94f7784842/rdflib/plugins/stores/memory.py#L531-L537

which might change to something like:

        if override:
            if bound_prefix is not None:
                # del self.__namespace[bound_prefix]
                pass
            if bound_namespace is not None:
                del self.__prefix[bound_namespace]
            self.__prefix[namespace].add(prefix)
            self.__namespace[prefix] = namespace

There are tests now that enforce the current behavior - bind("xyz", URI, override=True) is tested to verify that it results in removal of any previous prefix that mapped to the URI in question - that seems wrong. If the URI already has a mapping, override should allow the new mapping to be added without removing the old one - or alternatively (I might be confused about the intent of override), that's what should happen if override is False... or maybe we can just get rid of override, since we have replace for when we want to change the mapping for a prefix?

jclerman commented 1 year ago

If the above seems reasonable, I'd be willing to open a PR to make the changes. It seems like a potentially-significant change though, so some discussion of the change seems indicated.

aucampia commented 1 year ago

This error came in with one of these PRs:

severin-lemaignan commented 1 year ago

I believe this is also the source of an issue with default prefixes:

This works:

PREFIX foo: <http://example.com#>
PREFIX : <http://example.com#>

SELECT ?a ?b
WHERE {
   ?a :bar ?b .
}

This causes: Exception: Unknown namespace prefix : foo

PREFIX foo: <http://example.com#>
PREFIX : <http://example.com#>

SELECT ?a ?b
WHERE {
   ?a foo:bar ?b .
}
hsekol-hub commented 7 months ago

Following for this issue.

Has this been resolved yet? I too am not able to keep multiple IRI for different prefixes and need this feature.

white-gecko commented 5 months ago

The pr #2800 introduces the tests for this issue.

My use case as:

So this touches both, the issue of duplicate prefixes and of parsing/serialization side effects.