RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Request for method to detect silent triple drops #2017

Open ajnelson-nist opened 2 years ago

ajnelson-nist commented 2 years ago

Hello,

I just encountered an issue where a rdflib-downstream tool didn't behave as I expected it to when I tried to introduce a wrong-identifier-form error. This is functionally equivalent to the example JSON-LD snippet I used:

import rdflib

graph_data = """\
{
    "@context": {
        "ex": "http://example.org/",
        "rdf": "http://www.w3.org/1999/02/22-rdf-syntax-ns#",
        "rdfs": "http://www.w3.org/2000/01/rdf-schema#"
    },
    "@graph": [
        {
            "@id": "ex:thing-1",
            "@type": "ex:SomeThing",
            "ex:someStringProperty": {
                "@id": "nonlinking identifier string form"
            }
        }
    ]
}
"""

graph = rdflib.Graph()
graph.parse(data=graph_data, format="json-ld")
assert 2 == len(graph)

That example raises an AssertionError, 2 != 1. However, I got no notice from the JSON-LD parser that this was erroneous and silently dropped a triple. (Impact was: this resulted in a false-negative SHACL validation test.)

Is there some mechanism available within rdflib's JSON-LD code that reviews node references and raises errors when a malformed one is found?

(EDIT: Fixed Github markdown.)

aucampia commented 1 year ago

Actually, I think a triple drop should be an error, so I'm making this as a bug.

ajnelson-nist commented 1 year ago

Thank you for the consideration. I recall there's a Warning thrown somewhere in RDFLib about malformed URLs, though...hm, unhelpfully, I can't remember how I've triggered that before. If you know what I'm talking about: Is that a bit of code behavior you could borrow and repeat here?

aucampia commented 1 year ago

Possibly you are thinking of this: https://github.com/RDFLib/rdflib/blob/0d07f9bc014562d77121768414ec20fd9382ed0a/rdflib/term.py#L277-L281

But there is also:

https://github.com/RDFLib/rdflib/blob/0d07f9bc014562d77121768414ec20fd9382ed0a/rdflib/term.py#L2085-L2094

Both cases should not result in dropped triples, we had some valid complaints about the second one, that it is not entirely clear that it is just a warning, but the triple should still end up in the graph, cases where a triple can't be constructed should result in an exception to the function call that tried to add it.

There may be cases where this does not happen, but if we can identify them, we should fix them, so they raise exceptions instead of silently (from the point of the caller) dropping triples.

ajnelson-nist commented 1 year ago

Line 279, that's the error message I was thinking of! Thanks.

... cases where a triple can't be constructed should result in an exception to the function call that tried to add it.

I agree.