CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
137 stars 32 forks source link

input placeholders not picked up when bind statement is present #352

Closed tkuhn closed 3 years ago

tkuhn commented 3 years ago

I am running into this problem that I want to add a bind statement like this one: https://github.com/peta-pico/fpsi-queries/blob/main/get-classdef-reviews.rq#L85

This seemed to work fine, until I realized that the input placeholders like ?__author_iri (here) are no longer picked up. After commenting out the bind-statement above, they work again.

I tried this with the grlc.io service as well as a local instance with the latest Docker version, with the same result.

This seems to be a bug, or am I doing something wrong?

tkuhn commented 3 years ago

It seems this is triggered by strings in the bind statement containing escaped Unicode sequences of the form "\u0001". I rewrote the regex to avoid such sequences, and the problem seems to be gone.

So, the bug seems only to be related to the handling of such "\u...." patterns.

c-martinez commented 3 years ago

Hi @tkuhn! so if I understand correctly, the bug is when handling queries which contain unicode sequences? Do you think you could help us build a minimal example query which triggers the bug so we can go bug-hunting?

tkuhn commented 3 years ago

Sure! Minimal example is here: http://grlc.io/api-git/tkuhn/grlc-unicode-test

The fails query contains the unicode escape sequence \u0001 in a string, which leads to the placeholder ?__class_iri not being picked up as such.

The succeeds query is identical except for the unicode escape sequence. Here the placeholder is picked up without problems.

Otherwise, both queries work just fine.

c-martinez commented 3 years ago

Hi @tkuhn. Bad news: after some investigation, it looks like the issue comes from how rdflib is parsing your query. An exception is raised when running Query.parseString:

from rdflib.plugins.sparql.parser import Query

rq = '#+ endpoint: https://dbpedia.org/sparql\n\nSELECT ?b ?c WHERE {\n  ?b a ?__class_iri .\n  bind("\\u0001" as ?c)\n} LIMIT 1\n'

Query.parseString(rq, parseAll=True)

exception raised is:

ParseException: Expected {SelectQuery | ConstructQuery | DescribeQuery | AskQuery} (at char 86), (line:5, col:3)

I think the best we can do is make sure the exception gets propagated properly and that the user know the query could not be parsed.

tkuhn commented 3 years ago

This seems strange, as the query does work. Only the placeholders aren't picked up. When I run the example above, everything works, except that I am not given the possibility to fill in the placeholder. So it seems SPARQL parsing doesn't fail, at least not completeley. Or am I missing something?

c-martinez commented 3 years ago

Yes, it is a bit complicated. The query gets parsed to extract variables. This is the parse that is failing, which is why the ?__class_iri placeholder does not become a variable. But the parsing exception is not being properly propagated, so the Swagger UI still has a fails method.

PR #353 makes the Swagger UI display a warning to the user, so that the user knows there is an issue.

tkuhn commented 3 years ago

I see. I didn't realize the parsing is only for the placeholder extraction. Showing a warning will certainly help. I found a workaround for my particular query that doesn't involve any "\u...." sequences.