Open ajnelson-nist opened 2 years ago
@ajnelson-nist that is in primarily because it will return URIRef objects and not Namespace objects:
From monkeytype:
20220729T042949 iwana@teekai.zoic.eu.org:~/sw/d/github.com/iafork/rdflib.monkeytype
$ task run -- monkeytype stub rdflib.namespace | tee rdflib/namespace/__init__.pyi
task: [run]
VIRTUAL_ENV="/home/iwana/sw/d/github.com/iafork/rdflib.monkeytype/.venv" PATH="/home/iwana/sw/d/github.com/iafork/rdflib.monkeytype/.venv/bin${PATH:+:${PATH}}" monkeytype stub rdflib.namespace
55 traces failed to decode; use -v for details
from rdflib.term import URIRef
from typing import (
Any,
Dict,
Iterable,
List,
Optional,
Tuple,
Type,
Union,
)
# ...elided...
class NamespaceManager:
def __init__(self, graph: Graph, bind_namespaces: _NamespaceSetString = ...) -> None: ...
def _store_bind(self, prefix: str, namespace: URIRef, override: bool) -> None: ...
def absolutize(self, uri: str, defrag: int = ...) -> URIRef: ...
def bind(self, prefix: Optional[str], namespace: Any, override: bool = ..., replace: bool = ...) -> None: ...
def compute_qname(self, uri: str, generate: bool = ...) -> Tuple[str, URIRef, str]: ...
def compute_qname_strict(self, uri: str, generate: bool = ...) -> Tuple[str, str, str]: ...
def expand_curie(self, curie: str) -> Optional[URIRef]: ...
def namespaces(self) -> Iterable[Tuple[str, URIRef]]: ...
def normalizeUri(self, rdfTerm: str) -> str: ...
def qname(self, uri: str) -> str: ...
def qname_strict(self, uri: str) -> str: ...
def reset(self) -> None: ...
@property
def store(self) -> Store: ...
# ...elided...
And from pytest-annotate
{
"path": "rdflib/namespace/__init__.py",
"line": 707,
"func_name": "NamespaceManager.namespaces",
"type_comments": [
"() -> Iterator",
"() -> Iterator[Tuple[str, rdflib.term.URIRef]]"
],
"samples": 2051
},
@aucampia That is an informative view of the inertia of that type signature. However, I believe there are good reasons to upset it.
In particular, I'm concerned with the interface between the .namespaces
method and Graph.bind()
. .bind
hasn't received type review yet. Store.bind()
types namespace
as a URIRef
. As I noted on another thread this morning, I'm not so familiar with the Store
code tree. However, I do make frequent use of Graph.bind
.
I personally expect .namespaces
to give me objects that feed into Graph.bind
, and I further expect the objects from .namespaces
to be Namespace
s because I would like to be able to use the []
or .
operators to reference terms within a namespace.
I've seen significant confusion arise from these concepts being muddied, some back since the pre-RDF days of XML:
Each of these is still a distinct concept. Some current technologies make confusions of them all come data-serialization time. FOAF ends its ontology IRI with a slash, which OWL permits and which allows a handy and inoffensive confusion of ontology IRI, namespace, and string prefix. But I'm particularly thinking of JSON-LD's context dictionary that uses string prefixes, which are not necessarily namespaces. I encountered this when dealing with the differences in forward-slash parsing between SPARQL, Turtle, and JSON-LD to handle IANA Media types and their slashes. (IANA Media Types don't have an IANA-provide ontology to my knowledge, but Dublin Core Terms provides http://purl.org/dc/terms/IMT, and a "namespace for MIME media types vocabulary" from the Publishing Metadata documnt: <http://purl.org/NET/mediatypes/>
.)
So, I wish to argue for precision in the type designation for .namespaces()
, and let the type effects bubble out from there.
If you agree with this way forward, I will put some effort into the type designation with a PR.
In particular, I'm concerned with the interface between the
.namespaces
method andGraph.bind()
..bind
hasn't received type review yet.Store.bind()
typesnamespace
as aURIRef
. As I noted on another thread this morning, I'm not so familiar with theStore
code tree. However, I do make frequent use ofGraph.bind
.I personally expect
.namespaces
to give me objects that feed intoGraph.bind
, and I further expect the objects from.namespaces
to beNamespace
s because I would like to be able to use the[]
or.
operators to reference terms within a namespace.
Store, NamespaceManager and Graph will have more or less complete type signatures with https://github.com/RDFLib/rdflib/pull/2080 - the type signature for Graph.bind()
is:
In here, the namespace has type Any
, but for a very good reason which is more or less related to your concern here.
Currently we have namespaces occurring with the following types:
str
URIRef
Namespace
ClosedNamespace
DefinedNamespace
The class diagram for these types is:
If not for DefinedNamespace
, all classes used to represent namespaces would derive from str, but because DefinedNamespace
does not there is no common superclass, other than object which is the base class for all new style classes.
However, all these classes do have __str__
, and all values of namespace
passed to Graph.bind()
gets converted to URIRef by first calling str()
on them (which calls __str__
) and then by using that to construct a URIRef
:
So really, anything can be passed as the namespace
argument to Graph.bind()
and it will work, maybe there are some corner cases where we will raise an exception for invalid URIs, but I think currently we just warn for invalid URIs.
But this is also why the the following three methods return URIRef
, and not Namespace
:
Graph.namespaces()
: https://github.com/RDFLib/rdflib/blob/97f88d3200bedcc22ec100bffd5d17b1d57fd122/rdflib/graph.py#L1219NamespaceManager.namespaces()
: https://github.com/RDFLib/rdflib/blob/97f88d3200bedcc22ec100bffd5d17b1d57fd122/rdflib/namespace/__init__.py#L707Store.namespaces()
: https://github.com/RDFLib/rdflib/blob/97f88d3200bedcc22ec100bffd5d17b1d57fd122/rdflib/store.py#L449If you pass a Namespace
object to Graph.bind()
, it won't be stored as a Namespace
, but as a URIRef
, and if we change the typing of Graph.namespaces()
to be Namespace
instead of URIRef
, then the outcome will be that code will pass mypy validation but cause exceptions at runtime, there is no way under normal operation that you will get a it to return a Namespace
object. You could maybe get something if you add it to Store
implementations directly, but that will only work for some store implementations, and won't work for example if you use berkelydb:
It would be better if Graph.namespaces()
did return a Namespace
, but this would require changes in more than type hints, it would require that we change
to something like
namespace = Namespace(str(namespace))
(and some additional changes)
We should look at doing this in a future version of RDFLib, however we need quite a big overhaul of our public interfaces in general, as shared in some other places, I think the right approach here is that we should design a new Graph interface addressing most of the problems that we know of in rdflib._experimental
, and then once we are happy with the interface redesign we can proceed to release version 7. The alternative will be that every release becomes a major version release, as there is many things in the interface that needs fixing and doing it all in one PR will make for one very hard to review PR, and it is we will likely have some fluctuations in each version as we try and stick everything together.
We should also really have architectural decision records for interface breakages, and ideally there should be a deprecation notice or warning in at least one version with some option for adopting the new behaviour.
I had a sneaking feeling there would be some reason due to technology breadth.
Thanks to your thorough explanation---and thanks for it!---I agree with this needing to be in a 7.0.0 release.
Is there a development branch where such effects are being pooled, or is there some new module _experimental.py
where these will be pooled in the master
branch?
Is there a development branch where such effects are being pooled, or is there some new module
_experimental.py
where these will be pooled in themaster
branch?
I will try to find time to make an ADR this weekend to define _experimental
and the approach to interface breaking changes in general, once that is in place I will make an ADR for new interfaces. In general we should try and approach the problem similar to collections.abc with fairly minimal base base classes that provide very specific functionality so we can easily mix and match them.
But even with this we should be as conservative as possible with interface breakages, one other option here is to change Namespace and DefinedNamespace to inherit from URIRef, and then we can actually change the code just fine without breaking interfaces.
Ideally there should be issues for all problems with the existing interfaces, and most of them should be labelled as backwards incompatible
and enhancement
, I did not label this with backwards incompatible
though as there may be ways to fix this without breaking backwards compatibility.
Is there a reason
rdflib.namespace.NamespaceManager.namespaces
has the return signatureIterable[Tuple[str, URIRef]]
- specifically,URIRef
instead ofNamespace
?