CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
135 stars 33 forks source link

Internal Server Error #445

Closed tkuhn closed 4 months ago

tkuhn commented 7 months ago

I am getting a "500 Internal Server Error" for this query (get_list_nonqualifed_fsr_new): https://grlc.io/api-git/peta-pico/dsw-nanopub-api#/default/get_list_nonqualifed_fsr_new

It's a relatively complex one with federated parts with the service keyword, but the federation doesn't seem to be the problem, as this simpler query with federation is working (get_list_nonqualifed_fsr_new_x): https://grlc.io/api-git/peta-pico/dsw-nanopub-api#/default/get_list_nonqualifed_fsr_new_x

When I run the query against a grlc instance hosted on our servers, I see this error in the logs:

ERROR:Exception on /api/peta-pico/dsw-nanopub-api/list_nonqualifed_fsr_new [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.8/site-packages/flask_cors/extension.py", line 165, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.8/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.8/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.8/site-packages/grlc/server.py", line 194, in query_git
    return query(user, repo, query_name, subdir=subdir, sha=sha, content=content, git_type=static.TYPE_GITHUB)
  File "/usr/local/lib/python3.8/site-packages/grlc/server.py", line 59, in query
    query_response, status, headers = utils.dispatch_query(user, repo, query_name, subdir, spec_url,
  File "/usr/local/lib/python3.8/site-packages/grlc/utils.py", line 109, in dispatch_query
    resp, status, headers = dispatchSPARQLQuery(query, loader, requestArgs, acceptHeader, content, formData,
  File "/usr/local/lib/python3.8/site-packages/grlc/utils.py", line 220, in dispatchSPARQLQuery
    glogger.debug('Response header from endpoint: ' + response.headers['Content-Type'])
  File "/usr/local/lib/python3.8/site-packages/requests/structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-type'

I don't know how 'content-type' comes in here to cause this problem.

Any ideas what's causing this?

tkuhn commented 7 months ago

ps: I forgot to add that both queries work fine when directly submitted to the SPARQL endpoint.

tkuhn commented 6 months ago

Investigating a bit further, this problem could be related to URL length exceeding its limit.

This query reproduces the problem: https://github.com/knowledgepixels/grlc-test/blob/main/minimal-comments-2.rq

The query itself is minimal, but the whole query file is larger due to long comments.

Trying it on grlc.io with the standard endpoint (http://dbpedia.org/sparql), I get a 414 Request-URI Too Large error: https://grlc.knowledgepixels.com/api-git/knowledgepixels/grlc-test/#/default/post_minimal_comments_2

If I run it against my own endpoint (https://query.np.trustyuri.net/repo/meta), which I have configured to allow for long URLs, I get the error from above: 500 Internal Server Error (but the query works when issued directly to the SPARQL endpoint).

I have set method to POST, but I suppose that's only how the client is accessing the grlc server, not how grlc is accessing the SPARQL endpoint, right? If so, is there a way to configure the SPARQL endpoint requests to be POST too? I suppose that might be the problem.

tkuhn commented 6 months ago

After a bit more investigation, I am suspecting that adding

client.setMethod(SPARQLWrapper.Wrapper.POST)

around line 42 in sparql.py should solve the problem.

But I haven't figured out yet how to build it and test it on my own. When I build it through Docker it doesn't seem to pick up the updated code...

tkuhn commented 6 months ago

OK, found it! :)

sparql.rq seems to be dead code, or at least not the commonly executed part.

This solved it: https://github.com/CLARIAH/grlc/commit/b09824ab3673cb34b2400b78d275f512d0d1a43c

It works on my own Virtoso and RDF4J SPARQL endpoints. But for some reason, it doesn't work on the default https://dbpedia.org/sparql .

That's why I didn't make a pull request yet. And maybe this should rather be triggered by an argument like endpoint-method to it can manually be set to GET and POST.

c-martinez commented 5 months ago

Hi @tkuhn -- thanks! That looks like it was a hard issue to track down, so thanks for your hard work! I am not sure if it is standard for endpoints to receive requests via GET or POST or are both possible? In which case, maybe we should do as you say and make it optional with a parameter (with the default being POST perhaps?)

tkuhn commented 5 months ago

If I understand the SPARQL protocol (https://www.w3.org/TR/sparql11-protocol/) correctly, then endpoints need to provide both, GET and POST support.

POST as default sounds reasonable to me, except possibly for backwards compatibility, if not all endpoints implement the protocol fully (as with DBpedia, it seems)?

c-martinez commented 4 months ago

This is fixed in release v1.3.9