Closed LachlanStuart closed 3 years ago
Yeah, it's really weird that is hardcoded like that. Timeouts with the token manager are a known thorn, and I believe that it's being worked on. I'm not sure if the best fix is to have a configurable timeout, or to have a number of retried with exponential backoff, like other transient 5xx errors. Thanks for raising the issue - I'll check with the team and someone will post an update here.
I'm also experiencing a issue that seems related to this with ibm-cos-sdk==2.6.3
Jun 3 21:36:24 staging-serv-upload-v2-56c69fc465-wk5s9 serv-upload-v2 WARNING WARNING:ibm_botocore.credentials:Refreshing temporary credentials failed during advisory refresh period.
Jun 3 21:36:24 staging-serv-upload-v2-56c69fc465-wk5s9 serv-upload-v2 Traceback (most recent call last):
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 426, in _make_request
six.raise_from(e, None)
File "<string>", line 3, in raise_from
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 421, in _make_request
httplib_response = conn.getresponse()
File "/usr/local/lib/python3.8/http/client.py", line 1332, in getresponse
response.begin()
File "/usr/local/lib/python3.8/http/client.py", line 303, in begin
version, status, reason = self._read_status()
File "/usr/local/lib/python3.8/http/client.py", line 264, in _read_status
line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1")
File "/usr/local/lib/python3.8/socket.py", line 669, in readinto
return self._sock.recv_into(b)
File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into
return self.read(nbytes, buffer)
File "/usr/local/lib/python3.8/ssl.py", line 1099, in read
return self._sslobj.read(len, buffer)
Jun 3 21:36:24 staging-serv-upload-v2-56c69fc465-wk5s9 serv-upload-v2 socket.timeout: The read operation timed out
Jun 3 21:36:24 staging-serv-upload-v2-56c69fc465-wk5s9 serv-upload-v2 During handling of the above exception, another exception occurred:
Jun 3 21:36:24 staging-serv-upload-v2-56c69fc465-wk5s9 serv-upload-v2 Traceback (most recent call last):
File "/home/fd4b/.local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
resp = conn.urlopen(
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 726, in urlopen
retries = retries.increment(
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 410, in increment
raise six.reraise(type(error), error, _stacktrace)
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/packages/six.py", line 735, in reraise
raise value
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 670, in urlopen
httplib_response = self._make_request(
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 428, in _make_request
self._raise_timeout(err=e, url=url, timeout_value=read_timeout)
File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 335, in _raise_timeout
raise ReadTimeoutError(
FYI, this issue is related to https://github.com/IBM/ibm-cos-sdk-python-core/issues/15.
We have implemented an override in 2.11.0, though the changelog mistakenly omits this. Can you verify and close this ticket?
Thanks for the response @IBMeric .
I can't really test it on demand as it's a sporadic issue, but I checked the code changes and unless I'm missing something... I mean, technically, yes, the new DefaultTokenManager.set_default_auth_function_timeout
function allows me to set a more reasonable timeout now, but the default is still an unreasonably short 5 seconds, and it looks like it will still spam my development PC with warning log messages from a background thread if I go offline...
I guess this can be closed as it technically resolves it from my side.... it's just... really not great API design when the library is unstable by default and the onus is on the developer to call an undocumented function just to reinstate sane defaults...
I see
ReadTimeoutError
s logged once or twice a week. Usually they seem to be just a warning emitted by the background refresh thread, but it has been raised at least once during a call toDefaultTokenManager.get_token
, which has caused my application to fail.Here's the stack trace of the error during a call to
DefaultTokenManager.get_token
. This code was running on a relatively low-traffic EC2 instance on AWS (Ireland region), so it's extremely unlikely to be caused by network issues on our end.And here is one from the background refresh thread running on my local development computer. This one was likely due to a network lag spike. I was not using any IBM APIs at the time, so getting my Python console overrun by a warning message about a non-vital background process was quite unwelcome.
I'm using version 2.9.0 which was bundled with
ibm-cos-sdk==2.9.0
DefaultTokenManager
seems to deliberately set the timeout to 5 seconds. This seems quite low if only one attempt is made before failure.I haven't seen any cases in the logs where this error has occurred multiple times in quick succession. Based on this, I think this could be resolved by wrapping this function with retry logic, possibly with increasing timeouts on each attempt, and preferably only logging an error if it fails after multiple retries.