Open NeilGraham opened 2 years ago
Indeed, it has been a long-standing issue, however, there is ongoing work which (happily) addresses all the issues you raise.
Although the PR is marked “closed”, that doesn't accurately reflect the reality. I had to delete and recreate my GH fork which also closed the PR and, as rebasing is preferred over merging, I've left the situation as is.
The changes are quite profoundly API-breaking from a usage perspective but the underlying semantics remain the same: Dataset.contexts()
is changed to return Graph identifiers and Dataset.graphs()
is introduced to return Graph objects. ConjunctiveGraph disappears, being completely replaced by Dataset - the core ConjunctiveGraph behaviour being preserved with Dataset(default_union=True).
For this reason, the Dataset re-work has to be part of a major version release (7.0, currently scheduled for October). When the core library is seeing fewer large-scale changes, I'll re-release the work as a draft PR.
I do think at least some of these things can be fixed without breaking interfaces, and as discussed before, I think we should try and publish an ADR before we commit to new interface designs.
My current priorities are mainly related to improving RDFLib quality to get it to a level where I feel comfortable with using it in a production setting, and this includes: Adding type hints, fixing bugs, expanding test suite compliance. There are many issues, and it is always easier to some extent to start with a clean slate, but I feel we do have some obligation to users to manage our API with judiciously, because if we don't then we will essentially be abandoning semantic versioning. These interfaces came into RDFLib, and now we own them, and we have to make the best of it, and figure out a way to make things better while not shifting the burden to RDFLib users. I think what this really should indicate is how important it is to that our public API should be designed and evolved with great care.
As I indicated elsewhere, what I would like to see happen for interface re-architecthing:
rdflib._experimental
) so people can test it out before we commit to it.I failed to find a position where ConjunctiveGraph and Dataset could co-exist without significant architectural disruption - but that's not really saying anything. The LSP violations are profound: ConjunctiveGraph subclasses Graph and Dataset subclasses ConjunctiveGraph - however, it was never intended to be treated as an "is_a" inheritance relationship but as a mixin.
Just to recap ... as regards the now-closed PR #1814, it was a follow-on from my initial exploratory rebase (#1646) of gromgull's "identifier-as-context" PR #958. The rebase revealed that more work was required in order to achieve a clean implementation, which I explored and managed to get (what I considered to be) the basis of a satisfactorily cleaner approach. However, that work has been largely obviated by the subsequent identification of significant Liskov Substitution Principle violations in the Graph/ConjunctiveGraph/Dataset/Store architecture which, if allowed to persist, would significantly compromise the advantages of static type-checking which is clearly already adding to the robustness and reliability of the library. However, these violations are also fundamental features of the published API and are not trivially addressable.
At the moment there are a few key issues with the current
rdflib.Dataset
, particularly in its handling of parsing Default Graph triples. I am very interested in working on a Pull Request to solve these issues since it will improve Named Graph support for all projects making use ofrdflib
.Key issues with
rdflib.Dataset
:rdflib.Dataset
toapplication/n-quads
orapplication/trig
will always place default graph triples under a Blank Node named graph.rdflib.Dataset
.Dataset().parse(data=...)
istext/turtle
when it really should beapplication/trig
.By adding a new term
rdflib.term.DefaultGraph
, we can handle the serialization of these triples independent from Named Graph triples. Additionally, we should be able to access the default graph by specifying the following: