IBM / ibm-cos-sdk-python-core

ibm-cos-sdk-python-core
Apache License 2.0
6 stars 14 forks source link

Refreshing temporary credentials failed during advisory refresh period. #15

Closed roryodonnell closed 2 years ago

roryodonnell commented 3 years ago

I'm not an experienced Python developer, so go easy on me ;-) Noticing lots of credential refreshing errors in our logs

Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 Traceback (most recent call last):
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 382, in _make_request
    self._validate_conn(conn)
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 1010, in _validate_conn
    conn.connect()
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connection.py", line 411, in connect
    self.sock = ssl_wrap_socket(
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 449, in ssl_wrap_socket
    ssl_sock = _ssl_wrap_socket_impl(
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/util/ssl_.py", line 493, in _ssl_wrap_socket_impl
    return ssl_context.wrap_socket(sock, server_hostname=server_hostname)
  File "/usr/local/lib/python3.8/ssl.py", line 500, in wrap_socket
    return self.sslsocket_class._create(
  File "/usr/local/lib/python3.8/ssl.py", line 1040, in _create
    self.do_handshake()
  File "/usr/local/lib/python3.8/ssl.py", line 1309, in do_handshake
    self._sslobj.do_handshake()
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 socket.timeout: _ssl.c:1091: The handshake operation timed out
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 During handling of the above exception, another exception occurred:
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 Traceback (most recent call last):
  File "/home/fd4b/.local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send
    resp = conn.urlopen(
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen
    retries = retries.increment(
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment
    raise six.reraise(type(error), error, _stacktrace)
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise
    raise value
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen
    httplib_response = self._make_request(
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 385, in _make_request
    self._raise_timeout(err=e, url=url, timeout_value=conn.timeout)
  File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout
    raise ReadTimeoutError(
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='iam.cloud.ibm.com', port=443): Read timed out. (read timeout=5)
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 During handling of the above exception, another exception occurred:
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 Traceback (most recent call last):
  File "/home/fd4b/.local/lib/python3.8/site-packages/ibm_botocore/credentials.py", line 2627, in _protected_refresh
    metadata = self.auth_function()
  File "/home/fd4b/.local/lib/python3.8/site-packages/ibm_botocore/credentials.py", line 2529, in _default_auth_function
    response = requests.post(
  File "/home/fd4b/.local/lib/python3.8/site-packages/requests/api.py", line 119, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/home/fd4b/.local/lib/python3.8/site-packages/requests/api.py", line 61, in request
    return session.request(method=method, url=url, **kwargs)
  File "/home/fd4b/.local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request
    resp = self.send(prep, **send_kwargs)
  File "/home/fd4b/.local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send
    r = adapter.send(request, **kwargs)
  File "/home/fd4b/.local/lib/python3.8/site-packages/requests/adapters.py", line 529, in send
    raise ReadTimeout(e, request=request)
Aug 26 13:29:46 staging-serv-upload-v2-5cf6486f58-7lnzc serv-upload-v2 requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='iam.cloud.ibm.com', port=443): Read timed out. (read timeout=5)
roryodonnell commented 3 years ago

Connection timeout too low perhaps?

IBMeric commented 3 years ago

Do you see this behavior only with 2.10, meaning you can switch back and forth between 2.6 and 2.10 and reproduce it only with 2.10? If the problem does not follow the version, is it intermittent, which would be characteristic of a network and service issue?

roryodonnell commented 3 years ago

Haven't dug too far into it Eric - it may not be related to the version bump, we only started noticing it recently. I can see it from over two weeks ago (logDna retention time) - it may have been present before that, and even before 2.6.3 - but cannot be sure

roryodonnell commented 3 years ago

(Updated OP to remove specifics about versioning)

roryodonnell commented 3 years ago

WARNING:ibm_botocore.credentials:Refreshing temporary credentials failed during advisory refresh period. Traceback (most recent call last): File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 445, in _make_request six.raise_from(e, None) File "", line 3, in raise_from File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 440, in _make_request httplib_response = conn.getresponse() File "/usr/local/lib/python3.8/http/client.py", line 1332, in getresponse response.begin() File "/usr/local/lib/python3.8/http/client.py", line 303, in begin version, status, reason = self._read_status() File "/usr/local/lib/python3.8/http/client.py", line 264, in _read_status line = str(self.fp.readline(_MAXLINE + 1), "iso-8859-1") File "/usr/local/lib/python3.8/socket.py", line 669, in readinto return self._sock.recv_into(b) File "/usr/local/lib/python3.8/ssl.py", line 1241, in recv_into return self.read(nbytes, buffer) File "/usr/local/lib/python3.8/ssl.py", line 1099, in read return self._sslobj.read(len, buffer) socket.timeout: The read operation timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/fd4b/.local/lib/python3.8/site-packages/requests/adapters.py", line 439, in send resp = conn.urlopen( File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 755, in urlopen retries = retries.increment( File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/util/retry.py", line 532, in increment raise six.reraise(type(error), error, _stacktrace) File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/packages/six.py", line 770, in reraise raise value File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 699, in urlopen httplib_response = self._make_request( File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 447, in _make_request self._raise_timeout(err=e, url=url, timeout_value=read_timeout) File "/home/fd4b/.local/lib/python3.8/site-packages/urllib3/connectionpool.py", line 336, in _raise_timeout raise ReadTimeoutError( urllib3.exceptions.ReadTimeoutError: HTTPSConnectionPool(host='iam.cloud.ibm.com', port=443): Read timed out. (read timeout=5)

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home/fd4b/.local/lib/python3.8/site-packages/ibm_botocore/credentials.py", line 2627, in _protected_refresh metadata = self.auth_function() File "/home/fd4b/.local/lib/python3.8/site-packages/ibm_botocore/credentials.py", line 2529, in _default_auth_function response = requests.post( File "/home/fd4b/.local/lib/python3.8/site-packages/requests/api.py", line 119, in post return request('post', url, data=data, json=json, kwargs) File "/home/fd4b/.local/lib/python3.8/site-packages/requests/api.py", line 61, in request return session.request(method=method, url=url, kwargs) File "/home/fd4b/.local/lib/python3.8/site-packages/requests/sessions.py", line 542, in request resp = self.send(prep, send_kwargs) File "/home/fd4b/.local/lib/python3.8/site-packages/requests/sessions.py", line 655, in send r = adapter.send(request, kwargs) File "/home/fd4b/.local/lib/python3.8/site-packages/requests/adapters.py", line 529, in send raise ReadTimeout(e, request=request) requests.exceptions.ReadTimeout: HTTPSConnectionPool(host='iam.cloud.ibm.com', port=443): Read timed out. (read timeout=5)

roryodonnell commented 3 years ago

Having local issues creating a PR - may I suggest something like the below as a solution to enable us to override the default of 5 seconds.

credentials.py

...
...
...

class DefaultTokenManager(TokenManager):
    """A default implementation of token manager.
    Retreives token on first use and holds it in memory
    for further use. Background thread tries to refresh token
    prior to its expiration, so that main thread is always
    non-blocking and performant.
    :ivar API_TOKEN_URL: Default URL for IAM authentication.
    :ivar _advisory_refresh_timeout: The time at which we'll attempt to refresh, but not
        block if someone else is refreshing.
    :ivar _mandatory_refresh_timeout: The time at which all threads will block waiting for
        refreshed credentials.
    """
    API_TOKEN_URL = 'https://iam.cloud.ibm.com/identity/token'
    _advisory_refresh_timeout = 0
    _mandatory_refresh_timeout = 0
    REFRESH_OVERRIDE_IN_SECS = 0 # force refresh in this number of secs
    DEFAULT_AUTH_FUNCTION_TIMEOUT = 5

    ...
    ...
    ...
    @staticmethod
    def get_default_auth_function_timeout():
        """
        Return the class variable DEFAULT_AUTH_FUNCTION_TIMEOUT
        This variable represents the timeout value 
        when requesting a new auth token
        """
        return DefaultTokenManager.DEFAULT_AUTH_FUNCTION_TIMEOUT

    @staticmethod
    def set_default_auth_function_timeout(timeout=5):
        """
        Assign the class variable DEFAULT_AUTH_FUNCTION_TIMEOUT
        Overrides the default timeout (5 seconds) when requesting
        a new auth token
        :param timeout: The timeout value in seconds when requesting a new auth token
        :return: Old timeout value.
        """
        old_timeout = DefaultTokenManager.DEFAULT_AUTH_FUNCTION_TIMEOUT
        DefaultTokenManager.DEFAULT_AUTH_FUNCTION_TIMEOUT = timeout
        return old_timeout

    ...
    ...
    ...

    def _default_auth_function(self):
        response = requests.post(
                                 url=self._get_token_url(),
                                 data=self._get_data(),
                                 headers=self._get_headers(),
                                 timeout=DefaultTokenManager.DEFAULT_AUTH_FUNCTION_TIMEOUT,
                                 proxies=self.proxies,
                                 verify=self.get_verify())

        if response.status_code != httplib.OK:
            _msg = 'HttpCode({code}) - Retrieval of tokens from server failed.'.format(code=response.status_code)
            raise CredentialRetrievalError(provider=self._get_token_url(), error_msg=_msg)

        return json.loads(response.content.decode('utf-8'))
IBMeric commented 3 years ago

We are looking at your proposed solution now for possible inclusion. In most cases like this, we recommend users extend DefaultTokenManager to make the custom changes they want and then pass that into the token_manager parameter at client creation. In your case, the class would just inherit from DefaultTokenManager and override _default_auth_function to set whatever timeout you want. Note that while the create_client call does accept an auth_function parameter, which might seem perfect for this case, it does not work as you might expect. Extending DefaultTokenManager is the better approach.

roryodonnell commented 3 years ago

Thanks Eric - I was hoping to avoid extending the class as some future breaking changes would expose me to being forced to address them, when all I want/ed is to provide a timeout value

IBMeric commented 2 years ago

We have implemented an override in 2.11.0, though the changelog mistakenly has omitted listing this. Can you verify and close this ticket?

roryodonnell commented 2 years ago

Thank you Eric - I try to get it in and tested next sprint

IBMeric commented 2 years ago

Closing this ticket as resolved.