CLARIAH / grlc

grlc builds Web APIs using shared SPARQL queries
http://grlc.io
MIT License
137 stars 32 forks source link

Server throws HTTP 500 error when there is no accept header in the request #402

Open jaw111 opened 1 year ago

jaw111 commented 1 year ago

To reproduce the error, use this curl command:

curl "http://grlc.io/api-git/CLARIAH/grlc-queries/defaults?genre=http%3A%2F%2Fdbpedia.org%2Fresource%2FRock_music&endpoint=https%3A%2F%2Fdbpedia.org%2Fsparql" -H  "accept:"

The verbose output showing request and response headers:

*   Trying 213.187.244.59:80...
* Connected to grlc.io (213.187.244.59) port 80 (#0)
> GET /api-git/CLARIAH/grlc-queries/defaults?genre=http%3A%2F%2Fdbpedia.org%2Fresource%2FRock_music&endpoint=https%3A%2F%2Fdbpedia.org%2Fsparql HTTP/1.1
> Host: grlc.io
> User-Agent: curl/7.87.0
>
* Mark bundle as not supporting multiuse
< HTTP/1.1 500 INTERNAL SERVER ERROR
< Date: Mon, 19 Jun 2023 12:19:18 GMT
< Content-Type: text/html; charset=utf-8
< Content-Length: 290
< Connection: keep-alive
< Server: gunicorn/19.6.0
< Access-Control-Allow-Origin: *
<
{ [290 bytes data]
* Connection #0 to host grlc.io left intact
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>500 Internal Server Error</title>
<h1>Internal Server Error</h1>
<p>The server encountered an internal error and was unable to complete your request. Either the server is overloaded or there is an error in the application.</p>

Running a local instance of grlc, I see these kind of errors in the application logs:

/usr/local/lib/python3.6/site-packages/grlc/fileLoaders.py:272: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  self.spec = yaml.load(resp.text)
/usr/local/lib/python3.6/site-packages/grlc/gquery.py:277: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  query_metadata = yaml.load(yaml_string)
ERROR:Exception on /api-url/company-detail [GET]
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/usr/local/lib/python3.6/site-packages/flask_cors/extension.py", line 161, in wrapped_function
    return cors_after_request(app.make_response(f(*args, **kwargs)))
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/usr/local/lib/python3.6/site-packages/grlc/server.py", line 125, in query_param
    return query(user=None, repo=None, query_name=query_name, spec_url=spec_url, content=content)
  File "/usr/local/lib/python3.6/site-packages/grlc/server.py", line 47, in query
    glogger.debug("Request accept header: " + request.headers["Accept"])
  File "/usr/local/lib/python3.6/site-packages/werkzeug/datastructures.py", line 1463, in __getitem__
    return _unicodify_header_value(self.environ["HTTP_" + key])
KeyError: 'HTTP_ACCEPT'

If the accept header is expected/required, it'd be more appropriate to return a 4XX response code.

Otherwise, if no accept header is present, then serve JSON by default.

c-martinez commented 1 year ago

Hi @jaw111, thanks for reporting this.

I think in your example the accept header is present but blank. You are probably right and probably 400 Bad Request would be the most suitable response code?

jaw111 commented 1 year ago

Hi @c-martinez,

If you look at the verbose output, you see there is no accept header in the request.

Setting the header with an empty value will make curl remove the header, not just send a header with a blank value.

See: https://stackoverflow.com/a/7640101/21503085

jaw111 commented 1 year ago

@c-martinez the 406 Not Acceptable response status code would be a more meaningful response than 400 Bad Request.

Though https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/406 suggests:

In practice, this error is very rarely used. Instead of responding using this error code, which would be cryptic for the end user and difficult to fix, servers ignore the relevant header and serve an actual page to the user. It is assumed that even if the user won't be completely happy, they will prefer this to an error code.