dmwm / DBSClient

Apache License 2.0
3 stars 8 forks source link

Improve handling of HTTP response when it is not in JSON form #49

Closed vkuznet closed 3 years ago

vkuznet commented 3 years ago

Currently, DBSClient incorrectly handle JSON decoding errors when the data input does not represent JSON. For example, the following code:

from dbs.apis.dbsClient import *

url="https://cmsweb-testbed.cern.ch/dbs/bla/DBSReader"
api = DbsApi(url=url, debug=1)
res = api.help()
print(res)

will produce the following error:

HTTP=GET URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader method=help params={} data={} headers={'Content-Type': 'application/json', 'Accept': 'application/json', 'UserID': 'vk@vkair', 'User-Agent': 'DBSClient/Unknown/'}
Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/./t.py", line 6, in <module>
    res = api.help()
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 496, in help
    return self.__callServer("help", params=kwargs)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 403, in __callServer
    self.__parseForException(http_error)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 429, in __parseForException
    raise http_error
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 401, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 34, in get
    return http_request(self._curl)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request

Depending on return data it may be difficult to understand what exactly data had since the data is not returned by Python exception.

The proposed PR improve error handling by checking the actual HTTP code and if it is not 200, it can return the actual data including all details of HTTP request. For instance, running the same code we will get the following output:

HTTP=GET URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader method=help params={} data={} headers={'Content-Type': 'application/json', 'Accept': 'application/json', 'UserID': 'vk@vkair', 'User-Agent': 'DBSClient/Unknown/'}
### HTTPError ###
URL: https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader/help
HTTP code: 400
HTTP Message: Bad Request
HTTP Header: HTTP/1.1 400 Bad Request
Date: Thu, 11 Nov 2021 14:21:01 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1

HTTP Body
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>

Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/./t.py", line 6, in <module>
    res = api.help()
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 506, in help
    return self.__callServer("help", params=kwargs)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 413, in __callServer
    raise http_error
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 402, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 34, in get
    return http_request(self._curl)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request

Now, the output provide all details of HTTP response and preserve the same Python exception.

These modifications will be extremely useful in order to understand the actual error. For instance, if timeout happens on cmsweb frontends currently we have the following error:

Traceback (most recent call last):
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 440, in __parseForException
    data = json.loads(data)
  File "/data/users/vk/anaconda3/lib/python3.8/json/__init__.py", line 357, in loads
    return _default_decoder.decode(s)
  File "/data/users/vk/anaconda3/lib/python3.8/json/decoder.py", line 337, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/data/users/vk/anaconda3/lib/python3.8/json/decoder.py", line 355, in raw_decode
    raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/data/users/vk/dbs/DBS/DBSClient/tests/dbsclient_t/validation/DBSValidation_t.py", line 648, in test16
    result2 = self.cmswebtestbed_api.listFileParentsByLumi(block_name=block["block_name"])
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 737, in listFileParentsByLumi
    return self.__callServer("fileparentsbylumi", data=kwargs, callmethod='POST', aggFunc=aggFileParentsByLumi)
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 404, in __callServer
    self.__parseForException(http_error)
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 442, in __parseForException
    raise http_error
  File "/data/users/vk/dbs/DBS/DBSClient/src/python/dbs/apis/dbsClient.py", line 402, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/data/users/vk/dbs/DBS/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 40, in post
    return http_request(self._curl)
  File "/data/users/vk/dbs/DBS/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 502: Proxy Error

This error contains two exceptions, one from HTTPError and another from JSON decoder parser. But it does not provide any detail of HTTP response. For average user this error is very cryptic and hard to understand.

vkuznet commented 3 years ago

@yuyiguo , @amaltaro , @belforte please review and provide your feedback.

yuyiguo commented 3 years ago

@vkuznet Can you see my comments in the code? I hope they showed up this time.

klannon commented 3 years ago

I can't see any comments myself. I don't know if @vkuznet can. @yuyiguo, I just added you as a requested reviewer. This should give you a green button at the top of the page to "Add your review" which will let you put comments on the code that then @vkuznet can address. At least, this is how I've always handled reviews. @vkuznet if you'd like other reviewers, maybe it would be good to use the reviewers interface on GH to request reviews from them as well?

yuyiguo commented 3 years ago

Thanks @klannon , I used "add your review" button and submitted my comments. I hope everyone can see it this time. :-).

vkuznet commented 3 years ago

@yuyiguo I don't see your comments. Here is how I review the code:

vkuznet commented 3 years ago

I added Alan and Stefano as reviewers too. So far I don't see Yuyi's comments though.

yuyiguo commented 3 years ago

@vkuznet Can you see now?

vkuznet commented 3 years ago

@yuyiguo , yes now I can see it and will address it.

yuyiguo commented 3 years ago

A friendly reminder: please squash the commits into one.

belforte commented 3 years ago

Can you put the meaningful information in the exception message ? That way I will find them in the current logs. I am not sure that print to stdout would be logged by current code, while exceptions will be logged with traceback pointing to where they happened. Other than that I am all for this and do not dare to comment on coding, of all people here I am the one with worst python skills !

vkuznet commented 3 years ago

@belforte , good point, instead of adding printous which may or may not be visible (depends on app use the logger) I can easily add required info to the exception itself. I'll provide an update.

vkuznet commented 3 years ago

Based on your feedback I adjust code to be more robust in parsing HTTP headers and through exception with all details. Now the output of exception looks like:

HTTP=GET URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader method=help params={} data={} headers={'Content-Type': 'application/json', 'Accept': 'application/json', 'UserID': 'vk@vkair', 'User-Agent': 'DBSClient/Unknown/'}
Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 412, in __callServer
    self.http_response = method_func(self.url, method, params, data, request_headers)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RestApi.py", line 34, in get
    return http_request(self._curl)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/dbs3-pycurl-3.17.4/src/python/RestClient/RequestHandling/HTTPRequest.py", line 62, in __call__
    raise HTTPError(effective_url, http_code, http_response.msg, http_response.raw_header, http_response.body)
RestClient.ErrorHandling.RestClientExceptions.HTTPError: HTTP Error 400: Bad Request

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/./t.py", line 6, in <module>
    res = api.help()
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 516, in help
    return self.__callServer("help", params=kwargs)
  File "/Users/vk/CMS/DMWM/GIT/DBSClient/src/python/dbs/apis/dbsClient.py", line 423, in __callServer
    raise Exception(msg)
Exception:
URL=https://cmsweb-testbed.cern.ch:8443/dbs/bla/DBSReader/help
Code=400
Message=Bad Request
Header=HTTP/1.1 400 Bad Request
Date: Thu, 11 Nov 2021 17:54:57 GMT
Server: Apache
Content-Length: 226
Connection: close
Content-Type: text/html; charset=iso-8859-1

Body=<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>400 Bad Request</title>
</head><body>
<h1>Bad Request</h1>
<p>Your browser sent a request that this server could not understand.<br />
</p>
</body></html>

So far I decided to not through HTTPError since this class is defined in dbs3-pycurl library and it still hides all relevant details about HTTP error, like it does not provide actual body of HTTP response. If you want a more clean design, then the HTTP error code in dbs3-pycurl should be modified to properly give all details of HTTP response (like body, headers, etc). Please let me know if this would better option and I can re-arrange code to dbs3-pycurl.

vkuznet commented 3 years ago

@yuyiguo I converted it to working in progress draft as the consensus on how exception should be thrown should be clearly defined. I will squash changes once final version will be confirmed by you and others.

vkuznet commented 3 years ago

ok, now it is squashed.

vkuznet commented 3 years ago

@yuyiguo I want to get Alan's confirmation too.

@amaltaro please review and give your ok

yuyiguo commented 3 years ago

@yuyiguo I want to get Alan's confirmation too.

Sure, @vkuznet

vkuznet commented 3 years ago

@amaltaro please review it before you're going to vacation such that I can merge and proceed with it.

vkuznet commented 3 years ago

@amaltaro , the headers in this case comes from pycurl library which provide them as they come on a wire (i.e. HTTP response). The headers never comes as dict, list types as those are language specific data-types.