RDFLib / rdflib

RDFLib is a Python library for working with RDF, a simple yet powerful language for representing information.
https://rdflib.readthedocs.org
BSD 3-Clause "New" or "Revised" License
2.15k stars 555 forks source link

Invalid URI crashes without BadSyntax error #821

Closed jameshowison closed 2 years ago

jameshowison commented 6 years ago

I am cursed with URIs with % escape codes in them. If one comes through with an invalid escape code (e.g., %F rather than %2F) rdflib crashes with a message AttributeError: 'SinkParser' object has no attribute 'line'.

I'm not able to install the github code to check if this is still a problem (I'm using rdflib 4.2.2 via python3) but a scan of notation.n3 suggests that this is an condition that ought to generate a BadSyntax error?

eg: doi:10.1257%2Fjep.27.1.223 accidentally written as doi:10.1257%2jep.27.1.223

ghost commented 2 years ago

Good catch. Looks like a typo. Sinkparser has a self.lines attribute but the call to BadSyntax here is referencing self.line and causing the AttributeError.

A simple check does the trick:

index ea26ca9d..f3fb905d 100755
--- a/rdflib/plugins/parsers/notation3.py
+++ b/rdflib/plugins/parsers/notation3.py
@@ -1374,7 +1374,7 @@ class SinkParser:
                         ):
                             raise BadSyntax(
                                 self._thisDoc,
-                                self.line,
+                                self.lines,
                                 argstr,
                                 i,
                                 "illegal hex escape " + c,

and it then properly returns a BadSyntax exception

def test_foo():
    bad_uri = URIRef("doi:10.1257%2jep.27.1.223")
    data = (f"""@prefix doi: <https://doi.org/> .\n{bad_uri} <urn:likes> <urn:cheese> .""")
    try:
        g = Graph().parse(data=data, format="ttl")
    except Exception as e:
        assert "BadSyntax" in repr(e)

Addressed in https://github.com/RDFLib/rdflib/pull/1529