Closed tkuhn closed 3 years ago
It seems this is triggered by strings in the bind statement containing escaped Unicode sequences of the form "\u0001". I rewrote the regex to avoid such sequences, and the problem seems to be gone.
So, the bug seems only to be related to the handling of such "\u...." patterns.
Hi @tkuhn! so if I understand correctly, the bug is when handling queries which contain unicode sequences? Do you think you could help us build a minimal example query which triggers the bug so we can go bug-hunting?
Sure! Minimal example is here: http://grlc.io/api-git/tkuhn/grlc-unicode-test
The fails
query contains the unicode escape sequence \u0001
in a string, which leads to the placeholder ?__class_iri
not being picked up as such.
The succeeds
query is identical except for the unicode escape sequence. Here the placeholder is picked up without problems.
Otherwise, both queries work just fine.
Hi @tkuhn. Bad news: after some investigation, it looks like the issue comes from how rdflib is parsing your query. An exception is raised when running Query.parseString
:
from rdflib.plugins.sparql.parser import Query
rq = '#+ endpoint: https://dbpedia.org/sparql\n\nSELECT ?b ?c WHERE {\n ?b a ?__class_iri .\n bind("\\u0001" as ?c)\n} LIMIT 1\n'
Query.parseString(rq, parseAll=True)
exception raised is:
ParseException: Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery} (at char 86), (line:5, col:3)
I think the best we can do is make sure the exception gets propagated properly and that the user know the query could not be parsed.
This seems strange, as the query does work. Only the placeholders aren't picked up. When I run the example above, everything works, except that I am not given the possibility to fill in the placeholder. So it seems SPARQL parsing doesn't fail, at least not completeley. Or am I missing something?
Yes, it is a bit complicated. The query gets parsed to extract variables. This is the parse that is failing, which is why the ?__class_iri
placeholder does not become a variable. But the parsing exception is not being properly propagated, so the Swagger UI still has a fails
method.
PR #353 makes the Swagger UI display a warning to the user, so that the user knows there is an issue.
I see. I didn't realize the parsing is only for the placeholder extraction. Showing a warning will certainly help. I found a workaround for my particular query that doesn't involve any "\u...." sequences.
I am running into this problem that I want to add a bind statement like this one: https://github.com/peta-pico/fpsi-queries/blob/main/get-classdef-reviews.rq#L85
This seemed to work fine, until I realized that the input placeholders like
?__author_iri
(here) are no longer picked up. After commenting out the bind-statement above, they work again.I tried this with the grlc.io service as well as a local instance with the latest Docker version, with the same result.
This seems to be a bug, or am I doing something wrong?