CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
136 stars 32 forks source link

Bad syntax (expected item in list or ')') #61

Closed barrynl closed 4 years ago

barrynl commented 7 years ago

Hi,

I have created a GitHub repository that breaks GRLC because it's commit history contains files with invalid characters. Try: http://grlc.io/api/barrynl/uncertainty-sparql

The error in the log can be found at the end of this issue.

I have reduced the repository to a minimal example to debug the error and it is caused by this commit:

barrynl/uncertainty-sparql@5419d3e

The commit contains a filename with parenthesis (give-me-all-uncertainty-values-(and-causes)-per-sentences.nq) and these parenthesis are directly included in the temp.prov.ttl file which breaks the turtle parser because URIs cannot contain parenthesis.

So, does anyone know a work around (maybe remove this particular commit)? Currently, I've created a new repository with the same files but without the commit history and this indeed works correct.

I think the code of GRLC can be improved by checking the commit files for filenames that contain characters that cannot be used in URIs.

Regards,

Barry

2017-06-16 17:47:49,900 [ERROR] (app.log_exception) Exception on /api/barrynl/uncertainty-sparql/spec [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1817, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1477, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1381, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1475, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python2.7/site-packages/flask/app.py", line 1461, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/home/grlc/grlc/src/server.py", line 188, in swagger_spec
    prov_g = grlcPROV(user, repo)
  File "/home/grlc/grlc/src/prov.py", line 31, in __init__
    self.init_prov_graph()
  File "/home/grlc/grlc/src/prov.py", line 45, in init_prov_graph
    self.prov_g.parse('temp.prov.ttl', format='turtle')
  File "/usr/local/lib/python2.7/site-packages/rdflib/graph.py", line 1037, in parse
    parser.parse(source, self, **args)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 1870, in parse
    p.loadStream(source.getByteStream())
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 434, in loadStream
    return self.loadBuf(stream.read())    # Not ideal
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 440, in loadBuf
    self.feed(buf)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 466, in feed
    i = self.directiveOrStatement(s, j)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 487, in directiveOrStatement
    j = self.statement(argstr, i)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 725, in statement
    j = self.property_list(argstr, i, r[0])
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 1081, in property_list
    j = self.verb(argstr, i, v)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 814, in verb
    j = self.prop(argstr, i, r)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 826, in prop
    return self.item(argstr, i, res)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 829, in item
    return self.path(argstr, i, res)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 837, in path
    j = self.nodeOrLiteral(argstr, i, res)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 1431, in nodeOrLiteral
    j = self.node(argstr, i, res)
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 1027, in node
    "expected item in list or ')'")
  File "/usr/local/lib/python2.7/site-packages/rdflib/plugins/parsers/notation3.py", line 1615, in BadSyntax
    raise BadSyntax(self._thisDoc, self.lines, argstr, i, msg)
BadSyntax: at line 188 of <>:
Bad syntax (expected item in list or ')') at ^ in:
"...b4b6a5c1813b8 .
result:file-give-me-all-uncertainty-values-(^and-causes)-per-sentences-nq   a       prov:Entity ;
            rdfs:label      "g..."
rlzijdeman commented 7 years ago

Hi @barrynl ,

in an attempt to resolve this issue, I've tried to run your query without grlc in Yasgui, a sparql editor. It appears that the endpoint is not available?

http://yasgui.org/short/H1V5NSz7-

Best,

Richard

albertmeronyo commented 7 years ago

Hi @barrynl

Good catch. Those URI's though are not generated by grlc, but by (Git2PROV)[https://github.com/IDLabResearch/Git2PROV] which we're using to generate PROV out of the repo's commit history. At a closer look, it seems that the bug might be in rdflib's parser instead, because it seems legal to have parenthesis in URIs. But this contradicts NodeJS Turtle validator, which also complains about those pars.

For the time being we're just skipping the parsing of Git2PROV's output if it fails.

Thanks! Albert

barrynl commented 7 years ago

Hi, thanks for your replies.

@rlzijdeman Sorry, yes, you are correct. I run GRLC in a docker-compose setup, so the endpoints make no sense outside that context. This means my "Try: http://grlc.io/api/barrynl/uncertainty-sparql" text above will also not work once the git2prov issue @albertmeronyo describes is resolved.

@albertmeronyo Thanks for the explanation. According to this post, URIs may contain parentheses: https://stackoverflow.com/a/1547940. Did not double check myself, though. But maybe IRIs have a different valid characters set.

Regards, Barry

barrynl commented 7 years ago

Do you want me to close this issue? Because you talk about 'for the time being' as if you are planning to find a better solution in the future :)

albertmeronyo commented 7 years ago

Hi @barrynl , thanks for your comment.

I opened an issue at the RDFLib tracker when we investigated this: https://github.com/RDFLib/rdflib/issues/752 But so far it hasn't caught attention. I just updated that issue with a related link to the Turtle spec that deals with escaping special characters --might be related.

Shall we wait a bit more for a response before closing? In the worst case scenario, is it an option to edit the history of the git repo? And: how bad is the current behavior to you (i.e. skipping the PROV generation for the commit that contains the filenames with pars)?

Best, Albert

c-martinez commented 4 years ago

It looks like the issue in rdflib never got fixed, and neither did this issue.

I will close this issue, please reopen it if it is still relevant.