Closed nbaksalyar closed 4 years ago
@nicholascar Is this issue still open? I'm thinking to solve this issue by adding assert statements in the the add function.
Adding assert
statements in production code is no good idea. They do not work if optimization is enabled. https://docs.python.org/3/reference/simple_stmts.html
Maybe this is also a cool feature to use the RDF lib to work with generalised RDF (sorry can't find the reference now on my mobile, but see also https://www.w3.org/2001/sw/wiki/Literals_as_Subjects). However, the serializers should only create output valid for the respective format.
There isn't a good way to catch issues like this at runtime without taking a performance hit. If this is implemented in rdflib core, then maybe it could be as an add_safe
function or similar, that could check for these kinds of things. There are even more insidious variants where some python type will sneak in to a literal and only cause issues when trying to sort entities somewhere else in the code base. Here is an example of I deal with this https://github.com/SciCrunch/sparc-curation/blob/f02a0cd1cbc27699a6525966b4d94b1111e14295/sparcur/export/triples.py#L90-L101 (this particular version doesn't catch the fact that literals are not valid subjects, but could be added).
One solution to these kinds of issues is a static type system. I'm not sure if python's type annotations would help here, but the might be a minimally obtrusive way to add this functionality for those who want it.
My feeling is maybe this should be caught at serialisation time, like we do with non rdf/xml serialisable URIs etc.
@prince17080 obviously still open! Sorry, we didn't get to handle this for 5.0.0.
@gromgull not sure this should only be handles "on the way out" at serlialization time. I think this really should be handled "on the way in" at creation, my general feeling being, if the spec (RDF) doesn't allow something, we shouldn't allow it in the Python Graph
object.
We don't tolerate this:
g.add((URIRef('A:'), URIRef('B:'), 25))
AssertionError: Object 25 must be an rdflib term
So we shouldn't tolerate:
g.add((Literal('A'), Literal('B'), Literal('C')))
Unless there's some very good code speed reason etc. that prevents us from not tolerating it.
Adding
assert
statements in production code is no good idea.
Clearly we have a bunch of them in there already...
A generalized RDF triple is a triple having a subject, a predicate, and object, where each can be an IRI, a blank node or a literal. A generalized RDF graph is a set of generalized RDF triples. A generalized RDF dataset comprises a distinguished generalized RDF graph, and zero or more pairs each associating an IRI, a blank node or a literal to a generalized RDF graph. (https://www.w3.org/TR/rdf11-concepts/#dfn-generalized-rdf-dataset)
But there is also a note:
Any users of generalized RDF triples, graphs or datasets need to be aware that these notions are non-standard extensions of RDF and their use may cause interoperability problems. There is no requirement on the part of any RDF tool to accept, process, or produce anything beyond standard RDF triples, graphs, and datasets.
In SPARQL 1.1, Literals are allowed in the subject position.
https://www.w3.org/TR/sparql11-query/#construct https://www.w3.org/TR/sparql11-query/#sparqlTriplePatterns
JSON-LD says:
However, JSON-LD also extends the RDF data model to optionally allow JSON-LD to serialize generalized RDF Datasets. (https://www.w3.org/TR/json-ld/#relationship-to-rdf)
They also put up a note:
The use of blank node identifiers to label properties is obsolete, and may be removed in a future version of JSON-LD, as is the support for generalized RDF Datasets.
As a conclusion we don't need to support it, but we can. As it is currently possible, whhat is a good reason to remove it from the Graph, if we can also ensure it for the serializers.
Adding assert statements in production code is no good idea.
Clearly we have a bunch of them in there already...
Maybe we should see if we can remove them in the future.
@nicholascar @white-gecko @tgbugs So, I guess the conclusion from the above discussion can be adding a new function add_safe
is the best solution with assert statements.
@prince17080 indeed there is the suggestion to introduce add_safe
and a supporter of that so you could perhaps create a PR for such a function.
Not sure I would use it much though - like @tgbugs I check my entries to a Graph()
in my own application code rather than throwing inserts at add
and then having it validate the values - but happy to have add_safe
proposed if others have use cases motivating such a function.
There's check_statement in util. A wrapper would be nothing more than a call to check_statement.
Thanks @oohlaf, really useful tip!
I close this issue now. Maybe someone finds a place for the hint of @oohlaf in the documentation or someone should ask and answer it on https://stackoverflow.com/questions/tagged/rdflib .
I was able to successfully create the following RDF graph:
It is serialised into Turtle as follows:
And according to the RDFLib code it is a valid triple. However, according to the RDF specification it is not:
And indeed when the output Turtle is validated with the W3C validator, it produces this error:
Hence, this bit in the README file is also incorrect:
I think this issue deserves a further discussion, and please let me know if I can fix this and raise a PR.