boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.48k stars 1.08k forks source link

sts.assumerolewithwebidentity taking a long time #2139

Open itssimon opened 4 years ago

itssimon commented 4 years ago

Describe the bug The sts.assumerolewithwebidentity command takes almost 1 second. This affects us in a situation where we use boto3 to upload a file to S3 and it occasionally takes 6x as long as usual.

We're running this on AWS EKS with IAM Roles for Service Accounts (IRSA).

Expected behavior Authentication happens much quicker

image
swetashre commented 4 years ago

@itssimon - Thank you for your post. Can you please provide me debug logs ? You can enable log by adding boto3.set_stream_logger('') to your code.

itssimon commented 4 years ago

See logs below.

Looks like the POST request to https://sts.amazonaws.com:443 is taking the majority of the time.

2020-09-15 00:24:09,881 botocore.hooks [DEBUG] Changing event name from creating-client-class.iot-data to creating-client-class.iot-data-plane
2020-09-15 00:24:09,882 botocore.hooks [DEBUG] Changing event name from before-call.apigateway to before-call.api-gateway
2020-09-15 00:24:09,883 botocore.hooks [DEBUG] Changing event name from request-created.machinelearning.Predict to request-created.machine-learning.Predict
2020-09-15 00:24:09,884 botocore.hooks [DEBUG] Changing event name from before-parameter-build.autoscaling.CreateLaunchConfiguration to before-parameter-build.auto-scaling.CreateLaunchConfiguration
2020-09-15 00:24:09,884 botocore.hooks [DEBUG] Changing event name from before-parameter-build.route53 to before-parameter-build.route-53
2020-09-15 00:24:09,885 botocore.hooks [DEBUG] Changing event name from request-created.cloudsearchdomain.Search to request-created.cloudsearch-domain.Search
2020-09-15 00:24:09,885 botocore.hooks [DEBUG] Changing event name from docs.*.autoscaling.CreateLaunchConfiguration.complete-section to docs.*.auto-scaling.CreateLaunchConfiguration.complete-section
2020-09-15 00:24:09,888 botocore.hooks [DEBUG] Changing event name from before-parameter-build.logs.CreateExportTask to before-parameter-build.cloudwatch-logs.CreateExportTask
2020-09-15 00:24:09,888 botocore.hooks [DEBUG] Changing event name from docs.*.logs.CreateExportTask.complete-section to docs.*.cloudwatch-logs.CreateExportTask.complete-section
2020-09-15 00:24:09,888 botocore.hooks [DEBUG] Changing event name from before-parameter-build.cloudsearchdomain.Search to before-parameter-build.cloudsearch-domain.Search
2020-09-15 00:24:09,888 botocore.hooks [DEBUG] Changing event name from docs.*.cloudsearchdomain.Search.complete-section to docs.*.cloudsearch-domain.Search.complete-section
2020-09-15 00:24:09,891 botocore.credentials [DEBUG] Looking for credentials via: env
2020-09-15 00:24:09,891 botocore.credentials [DEBUG] Looking for credentials via: assume-role
2020-09-15 00:24:09,891 botocore.credentials [DEBUG] Looking for credentials via: assume-role-with-web-identity
2020-09-15 00:24:09,894 botocore.loaders [DEBUG] Loading JSON file: /opt/conda/lib/python3.8/site-packages/botocore/data/endpoints.json
2020-09-15 00:24:09,899 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x7f83d576e040>
2020-09-15 00:24:10,048 botocore.loaders [DEBUG] Loading JSON file: /opt/conda/lib/python3.8/site-packages/botocore/data/s3/2006-03-01/service-2.json
2020-09-15 00:24:10,058 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function add_generate_presigned_post at 0x7f83d579db80>
2020-09-15 00:24:10,058 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function lazy_call.<locals>._handler at 0x7f83d57020d0>
2020-09-15 00:24:10,089 botocore.hooks [DEBUG] Event creating-client-class.s3: calling handler <function add_generate_presigned_url at 0x7f83d579d940>
2020-09-15 00:24:10,092 botocore.endpoint [DEBUG] Setting s3 timeout as (60, 60)
2020-09-15 00:24:10,093 botocore.loaders [DEBUG] Loading JSON file: /opt/conda/lib/python3.8/site-packages/botocore/data/_retry.json
2020-09-15 00:24:10,093 botocore.client [DEBUG] Registering retry handlers for service: s3
2020-09-15 00:24:39,480 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjectsV2: calling handler <function set_list_objects_encoding_type_url at 0x7f83d571c8b0>
2020-09-15 00:24:39,481 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjectsV2: calling handler <function validate_bucket_name at 0x7f83d57995e0>
2020-09-15 00:24:39,481 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjectsV2: calling handler <bound method S3RegionRedirector.redirect_from_cache of <botocore.utils.S3RegionRedirector object at 0x7f83d515a5e0>>
2020-09-15 00:24:39,481 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjectsV2: calling handler <bound method S3ArnParamHandler.handle_arn of <botocore.utils.S3ArnParamHandler object at 0x7f83d515a6a0>>
2020-09-15 00:24:39,481 botocore.hooks [DEBUG] Event before-parameter-build.s3.ListObjectsV2: calling handler <function generate_idempotent_uuid at 0x7f83d5799430>
2020-09-15 00:24:39,481 botocore.hooks [DEBUG] Event before-call.s3.ListObjectsV2: calling handler <function add_expect_header at 0x7f83d5799940>
2020-09-15 00:24:39,482 botocore.hooks [DEBUG] Event before-call.s3.ListObjectsV2: calling handler <bound method S3RegionRedirector.set_request_url of <botocore.utils.S3RegionRedirector object at 0x7f83d515a5e0>>
2020-09-15 00:24:39,482 botocore.hooks [DEBUG] Event before-call.s3.ListObjectsV2: calling handler <function inject_api_version_header_if_needed at 0x7f83d571cca0>
2020-09-15 00:24:39,482 botocore.endpoint [DEBUG] Making request for OperationModel(name=ListObjectsV2) with params: {'url_path': '/xxx?list-type=2', 'query_string': {'prefix': '.', 'encoding-type': 'url'}, 'method': 'GET', 'headers': {'User-Agent': 'Boto3/1.14.52 Python/3.8.5 Linux/4.14.186-146.268.amzn2.x86_64 Botocore/1.17.52'}, 'body': b'', 'url': 'https://s3.amazonaws.com/xxx?list-type=2&prefix=.&encoding-type=url', 'context': {'client_region': 'us-east-1', 'client_config': <botocore.config.Config object at 0x7f83d5368820>, 'has_streaming_input': False, 'auth_type': None, 'encoding_type_auto_set': True, 'signing': {'bucket': 'xxx'}}}
2020-09-15 00:24:39,482 botocore.hooks [DEBUG] Event request-created.s3.ListObjectsV2: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f83d5368490>>
2020-09-15 00:24:39,482 botocore.hooks [DEBUG] Event choose-signer.s3.ListObjectsV2: calling handler <bound method ClientCreator._default_s3_presign_to_sigv2 of <botocore.client.ClientCreator object at 0x7f83d56ff5e0>>
2020-09-15 00:24:39,482 botocore.hooks [DEBUG] Event choose-signer.s3.ListObjectsV2: calling handler <function set_operation_specific_signer at 0x7f83d5799310>
2020-09-15 00:24:39,482 botocore.hooks [DEBUG] Event before-sign.s3.ListObjectsV2: calling handler <bound method S3EndpointSetter.set_endpoint of <botocore.utils.S3EndpointSetter object at 0x7f83d515a730>>
2020-09-15 00:24:39,482 botocore.utils [DEBUG] Defaulting to S3 virtual host style addressing with path style addressing fallback.
2020-09-15 00:24:39,483 botocore.utils [DEBUG] Checking for DNS compatible bucket for: https://s3.amazonaws.com/xxx?list-type=2&prefix=.&encoding-type=url
2020-09-15 00:24:39,483 botocore.utils [DEBUG] Not changing URI, bucket is not DNS compatible: xxx
2020-09-15 00:24:39,483 botocore.hooks [DEBUG] Event choose-service-name: calling handler <function handle_service_name_alias at 0x7f83d576e040>
2020-09-15 00:24:39,484 botocore.loaders [DEBUG] Loading JSON file: /opt/conda/lib/python3.8/site-packages/botocore/data/sts/2011-06-15/service-2.json
2020-09-15 00:24:39,486 botocore.hooks [DEBUG] Event creating-client-class.sts: calling handler <function add_generate_presigned_url at 0x7f83d579d940>
2020-09-15 00:24:39,487 botocore.endpoint [DEBUG] Setting sts timeout as (60, 60)
2020-09-15 00:24:39,488 botocore.client [DEBUG] Registering retry handlers for service: sts
2020-09-15 00:24:39,488 botocore.hooks [DEBUG] Event before-parameter-build.sts.AssumeRoleWithWebIdentity: calling handler <function generate_idempotent_uuid at 0x7f83d5799430>
2020-09-15 00:24:39,488 botocore.hooks [DEBUG] Event before-call.sts.AssumeRoleWithWebIdentity: calling handler <function inject_api_version_header_if_needed at 0x7f83d571cca0>
2020-09-15 00:24:39,488 botocore.endpoint [DEBUG] Making request for OperationModel(name=AssumeRoleWithWebIdentity) with params: {'url_path': '/', 'query_string': '', 'method': 'POST', 'headers': {'Content-Type': 'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': 'Boto3/1.14.52 Python/3.8.5 Linux/4.14.186-146.268.amzn2.x86_64 Botocore/1.17.52'}, 'body': {'Action': 'AssumeRoleWithWebIdentity', 'Version': '2011-06-15', 'RoleArn': 'arn:aws:iam::xxx:role/xxx', 'RoleSessionName': 'botocore-session-1600129449', 'WebIdentityToken': 'xxx'}, 'url': 'https://sts.amazonaws.com/', 'context': {'client_region': 'aws-global', 'client_config': <botocore.config.Config object at 0x7f83d511d5e0>, 'has_streaming_input': False, 'auth_type': None}}
2020-09-15 00:24:39,489 botocore.hooks [DEBUG] Event request-created.sts.AssumeRoleWithWebIdentity: calling handler <bound method RequestSigner.handler of <botocore.signers.RequestSigner object at 0x7f83d511d550>>
2020-09-15 00:24:39,489 botocore.hooks [DEBUG] Event choose-signer.sts.AssumeRoleWithWebIdentity: calling handler <function disable_signing at 0x7f83d57998b0>
2020-09-15 00:24:39,489 botocore.endpoint [DEBUG] Sending http request: <AWSPreparedRequest stream_output=False, method=POST, url=https://sts.amazonaws.com/, headers={'Content-Type': b'application/x-www-form-urlencoded; charset=utf-8', 'User-Agent': b'Boto3/1.14.52 Python/3.8.5 Linux/4.14.186-146.268.amzn2.x86_64 Botocore/1.17.52', 'Content-Length': '1284'}>
2020-09-15 00:24:39,490 urllib3.connectionpool [DEBUG] Starting new HTTPS connection (1): sts.amazonaws.com:443
2020-09-15 00:24:41,821 urllib3.connectionpool [DEBUG] https://sts.amazonaws.com:443 "POST / HTTP/1.1" 200 1998
2020-09-15 00:24:41,822 botocore.parsers [DEBUG] Response headers: {'x-amzn-RequestId': '60a91d12-7ec1-450a-af46-55d0f2b160e8', 'Content-Type': 'text/xml', 'Content-Length': '1998', 'Date': 'Tue, 15 Sep 2020 00:24:41 GMT'}
2020-09-15 00:24:41,822 botocore.parsers [DEBUG] Response body:
b'<AssumeRoleWithWebIdentityResponse xmlns="https://sts.amazonaws.com/doc/2011-06-15/">\n  <AssumeRoleWithWebIdentityResult>\n    <Audience>sts.amazonaws.com</Audience>\n    <AssumedRoleUser>\n      <AssumedRoleId>xxx:botocore-session-1600129449</AssumedRoleId>\n      <Arn>arn:aws:sts::xxx:assumed-role/xxx/botocore-session-1600129449</Arn>\n    </AssumedRoleUser>\n    <Provider>arn:aws:iam::xxx:oidc-provider/oidc.eks.ap-southeast-2.amazonaws.com/id/xxx</Provider>\n    <Credentials>\n      <AccessKeyId>xxx</AccessKeyId>\n      <SecretAccessKey>xxx</SecretAccessKey>\n      <SessionToken>xxx</SessionToken>\n      <Expiration>2020-09-15T01:24:41Z</Expiration>\n    </Credentials>\n    <SubjectFromWebIdentityToken>system:serviceaccount:xxx:xxx</SubjectFromWebIdentityToken>\n  </AssumeRoleWithWebIdentityResult>\n  <ResponseMetadata>\n    <RequestId>60a91d12-7ec1-450a-af46-55d0f2b160e8</RequestId>\n  </ResponseMetadata>\n</AssumeRoleWithWebIdentityResponse>\n'
2020-09-15 00:24:41,823 botocore.hooks [DEBUG] Event needs-retry.sts.AssumeRoleWithWebIdentity: calling handler <botocore.retryhandler.RetryHandler object at 0x7f83d511df10>
2020-09-15 00:24:41,823 botocore.retryhandler [DEBUG] No retry needed.
2020-09-15 00:24:41,824 botocore.credentials [DEBUG] Retrieved credentials will expire at: 2020-09-15 01:24:41+00:00
2020-09-15 00:24:41,825 botocore.auth [DEBUG] Calculating signature using v4 auth.
2020-09-15 00:24:41,825 botocore.auth [DEBUG] CanonicalRequest:
GET
/xxx
encoding-type=url&list-type=2&prefix=.
host:s3.amazonaws.com
x-amz-content-sha256:000
x-amz-date:20200915T002441Z
x-amz-security-token:xxx
host;x-amz-content-sha256;x-amz-date;x-amz-security-token
xxx
2020-09-15 00:24:41,825 botocore.auth [DEBUG] StringToSign:
AWS4-HMAC-SHA256
20200915T002441Z
20200915/us-east-1/s3/aws4_request
xxx
2020-09-15 00:24:41,825 botocore.auth [DEBUG] Signature:
xxx
2020-09-15 00:24:41,825 botocore.endpoint [DEBUG] Sending http request: <AWSPreparedRequest stream_output=False, method=GET, url=https://s3.amazonaws.com/xxx?list-type=2&prefix=.&encoding-type=url, headers={'User-Agent': b'Boto3/1.14.52 Python/3.8.5 Linux/4.14.186-146.268.amzn2.x86_64 Botocore/1.17.52', 'X-Amz-Date': b'20200915T002441Z', 'X-Amz-Security-Token': b'xxx', 'X-Amz-Content-SHA256': 000', 'Authorization': b'AWS4-HMAC-SHA256 Credential=xxx/20200915/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=000'}>
2020-09-15 00:24:41,826 urllib3.connectionpool [DEBUG] Starting new HTTPS connection (1): s3.amazonaws.com:443
2020-09-15 00:24:42,650 urllib3.connectionpool [DEBUG] https://s3.amazonaws.com:443 "GET /xxx?list-type=2&prefix=.&encoding-type=url HTTP/1.1" 404 None
2020-09-15 00:24:42,651 botocore.parsers [DEBUG] Response headers: {'x-amz-request-id': 'xxx', 'x-amz-id-2': 'xxx', 'Content-Type': 'application/xml', 'Transfer-Encoding': 'chunked', 'Date': 'Tue, 15 Sep 2020 00:24:42 GMT', 'Server': 'AmazonS3'}
2020-09-15 00:24:42,651 botocore.parsers [DEBUG] Response body:
b'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>xxx</BucketName><RequestId>xxx</RequestId><HostId>xxx</HostId></Error>'
2020-09-15 00:24:42,652 botocore.parsers [DEBUG] Response headers: {'x-amz-request-id': 'xxx', 'x-amz-id-2': 'xxx', 'Content-Type': 'application/xml', 'Transfer-Encoding': 'chunked', 'Date': 'Tue, 15 Sep 2020 00:24:42 GMT', 'Server': 'AmazonS3'}
2020-09-15 00:24:42,653 botocore.parsers [DEBUG] Response body:
b'<?xml version="1.0" encoding="UTF-8"?>\n<Error><Code>NoSuchBucket</Code><Message>The specified bucket does not exist</Message><BucketName>xxx</BucketName><RequestId>xxx</RequestId><HostId>xxx</HostId></Error>'
2020-09-15 00:24:42,653 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjectsV2: calling handler <botocore.retryhandler.RetryHandler object at 0x7f83d515a580>
2020-09-15 00:24:42,653 botocore.retryhandler [DEBUG] No retry needed.
2020-09-15 00:24:42,653 botocore.hooks [DEBUG] Event needs-retry.s3.ListObjectsV2: calling handler <bound method S3RegionRedirector.redirect_from_error of <botocore.utils.S3RegionRedirector object at 0x7f83d515a5e0>>
2020-09-15 00:24:42,653 botocore.hooks [DEBUG] Event after-call.s3.ListObjectsV2: calling handler <function decode_list_object_v2 at 0x7f83d571c9d0>
Traceback (most recent call last):
  ...
  File "/opt/conda/lib/python3.8/site-packages/botocore/client.py", line 337, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/opt/conda/lib/python3.8/site-packages/botocore/client.py", line 656, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.NoSuchBucket: An error occurred (NoSuchBucket) when calling the ListObjectsV2 operation: The specified bucket does not exist
swetashre commented 3 years ago

@itssimon - Thank you for Providing debug logs. I don't see anything wrong in the debug logs that can increase time for assumerolewithwebidentity call. I am assuming this is the debug logs for the call where it is taking long time. Is it happening every time ? It can be due to network issue.

itssimon commented 3 years ago

image

@swetashre this happens once per hour, so every time the credentials have expired and need to be renewed.

This is badly impacting our API response times, so would really appreciate it if you could look into this further or help us get to the bottom of it.

Our infrastructure is in ap-southeast-2 (Sydney) and I suspect that those API calls might be routed through the US?

swetashre commented 3 years ago

Reopening the issue as it has not been fixed.

itssimon commented 3 years ago

@swetashre is there anything I can help with to help debug this? We're using Datadog Tracing with ddtrace in our Python application and it clearly shows botocore's sts.assumerolewithwebidentity calls as the source of long delays (> 2 seconds):

image

Unfortunately, these delays are not acceptable in our context and we obviously rely on AWS services that need authentication. What can we do to resolve this?

We tried explicitly setting the AWS_DEFAULT_REGION but that made no difference.

swetashre commented 3 years ago

It looks like we have to do the refresh in the background. The credentials have to be renewed at some point, so we'd have to do it before the current ones are invalidated but without blocking. I'm not sure if we have an easy way to do that right now. we'd need to do something to allow refreshing in a separate thread. This is related to the issue https://github.com/boto/boto3/issues/2345.

sidewinder12s commented 3 years ago

@swetashre You could try using the regional STS endpoint.

itssimon commented 3 years ago

Using a regional STS endpoint reduces the delay a bit (about 1 second less from ap-southeast-2), so that would be a good start. However, it doesn't solve the problem entirely.

jalaziz commented 2 years ago

We're seeing ~1s latencies as well. We're using the aws_encryption_sdk, but traces reveal the primary issue is calling sts.assumerolewithwebidentity from boto3.

sk- commented 1 year ago

Would it be possible to add an easier way of setting the default token duration so that one can mitigate the problem. By default the token lasts 1 hour, and is refreshed some minutes earlier. One can set a longer duration, up to 12 hours, however, the only way I found to set that value is by monkey-patching the Session as follows (note that in our case we do not have a config file, so it's not possible to set this value there):

import botocore

OldSession = botocore.session.Session

class NewSession(OldSession):
  @property
  def full_config(self):
    config = super().full_config
    config["profiles"]["default"]["duration_seconds"] = "43200"
    return config

botocore.session.Session = NewSession

It'd be great if any of the following were true:

Pacos commented 7 months ago

This issue is affecting us as well, big time. We use the boto inside our API calls, and despite calling STS in the same region, it's still up to ~600ms for each auth call. But we have a failsafe over our API calls to hard stop every call that takes more than ~150ms, for resiliency purposes. Which means each time a token has to be refreshed, it bricks our API because the auth can never finish. Since we don't want to compromise on resiliency and response times, we had to go deep in botocore, and hack a solution, creating a custom refreshable session where we instantiate our own sts client, to refresh the token in the background before boto tries to do it on its own. It works, but it's dirty, and probably not future-proof. A good solution would be the ability to refresh the token manually through the public SDK (not hacking private methods like we did), as well as controlling the token refresh time, as it has been already suggested here.