IBM / ibm-cos-sdk-python

ibm-cos-sdk-python
Apache License 2.0
45 stars 26 forks source link

IAM-Token refresh fails #59

Closed NielsKorschinsky closed 9 months ago

NielsKorschinsky commented 10 months ago

Moved to https://github.com/IBM/ibm-cos-sdk-python-core/issues/21 as the code is located there

Hi team!

I'm having an issue with the IAM-Token refresh. I am using the DataEngine python SDK, which internally uses this SDK. Queries usually time out after 60 minutes, but earlier if the IAM token lifespan is shorter. Therefore, if a token is already used for some time, the max query length is shorter. This leads to issues where a query already times out after 24/30 minutes, which is too short for bigger loads.

The reason why I open the issue here is that I get spammed on the logging with the following error message:

 Refreshing temporary credentials failed during the mandatory refresh period.
Traceback (most recent call last):

  File "/opt/app-root/lib64/python3.11/site-packages/ibm_botocore/credentials.py", line 2773, in _protected_refresh
    metadata = self.auth_function()
               ^^^^^^^^^^^^^^^^^^^^

  File "/opt/app-root/lib64/python3.11/site-packages/ibm_botocore/credentials.py", line 2685, in _default_auth_function
    raise CredentialRetrievalError(provider=self._get_token_url(), error_msg=_msg)

ibm_botocore.exceptions.CredentialRetrievalError: Error when retrieving credentials from https://iam.cloud.ibm.com/identity/token: HttpCode(400) - Retrieval of tokens from server failed.

This error/warning appears every 10 seconds, repeating for a very long time. Extra:

I'm not specifying any special IAM router, just using the default init for the SQLQuery of DataEngine:

return SQLQuery(
            api_key=self.__de_cos_api_key,
            instance_crn=self.__de_instance_crn,
            max_concurrent_jobs=self.__max_concurrent_jobs,
            max_tries=1, # this needs to be 1, as we do our own restart with increasing timers
            iam_max_tries=3, # increase in case of iam timeouts
            thread_safe=True # enable, unsure about the effects...
            )

Their params are 1-1 forwarded into the COSClient inside the SQLQuery. (not using staging_env)

COSClient.__init__(
            self,
            cloud_apikey=api_key,
            token=token,
            cos_url=target_cos_url,
            client_info=client_info,
            iam_max_tries=iam_max_tries,
            thread_safe=thread_safe,
            staging=staging_env,
        )

According to the error message, I assume that the failing refresh of the token is hard blocking me from executing longer queries. Can anyone please take a look at why the refresh is failing and how this could be fixed/workaround?

Thanks a lot!

NielsKorschinsky commented 10 months ago

As a side note, this was working fine for the last year. We have seen these warnings only since the last month or so.

(also since that time I've been getting logged out of the cloud terminal after a short time.. so probably some security enhancements?)

NielsKorschinsky commented 9 months ago

This is due to an CLOUD account security setting. Closing.

dragid10 commented 9 months ago

@NielsKorschinsky What was the security setting set in the cloud account? I'm also suddenly facing this same issue and not understand why

NielsKorschinsky commented 9 months ago

Hi @dragid10

Please check the amount of concurrent/parallel session allowed in your IAM settings. They are viewable in IAM/Settings at the bottom.

Please note that the lowest setting of all your accounts matters. Also, you can only see these settings if you are owner of the account or an admin. Therefore you might need to contact your manager to view them.

However, an easy solution is using service-id's, as these dont have such restrictions to my knowledge. Also, the timeout changes from 20 minutes of personal tokens to 60 minutes for service id's.

for us service id's did not work at first, as they are no direct w3 user. But in most cases, this wont be a issue