boto / boto3

AWS SDK for Python
https://aws.amazon.com/sdk-for-python/
Apache License 2.0
8.99k stars 1.86k forks source link

Lambda invocation request timeout while creating a scheduler #4164

Closed hardikdgsa closed 2 months ago

hardikdgsa commented 3 months ago

Describe the bug

I have following structure in my AWS. A lambda is trigger by the API gateway. This Lambda function is calling the AWS EventBridge scheduler which creates the schedule based on the time given. Most of time it works, however sometimes it is not working and Boto3 is giving the error. while creating the scheduler.

This issue is mainly related to Lambda only. I have tried multiple times in the Local Python and Notebook code, however issue in not produced in that the

Stack Trace: "2024-06-12 12:23:07,415 botocore.endpoint [DEBUG] Exception received when sending HTTP request. " "Traceback (most recent call last): " "File ""/var/task/urllib3/connection.py"", line 174, in _new_conn " "conn = connection.create_connection( " "File ""/var/task/urllib3/util/connection.py"", line 95, in create_connection " "raise err " "File ""/var/task/urllib3/util/connection.py"", line 85, in create_connection " "sock.connect(sa) " "TimeoutError: [Errno 110] Connection timed out " "During handling of the above exception, another exception occurred: " "Traceback (most recent call last): " "File ""/var/task/botocore/httpsession.py"", line 464, in send " "urllib_response = conn.urlopen( " "File ""/var/task/urllib3/connectionpool.py"", line 798, in urlopen " "retries = retries.increment( " "[DEBUG] 2024-06-12T12:23:07.415Z 3caadb7f-8c6a-4a40-8520-a61ea286addb Exception received when sending HTTP request. Traceback (most recent call last): File ""/var/task/urllib3/connection.py"", line 174, in _new_conn conn = connection.create_connection( File ""/var/task/urllib3/util/connection.py"", line 95, in create_connection raise err File ""/var/task/urllib3/util/connection.py"", line 85, in create_connection sock.connect(sa) TimeoutError: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ""/var/task/botocore/httpsession.py"", line 464, in send urllib_response = conn.urlopen( File ""/var/task/urllib3/connectionpool.py"", line 798, in urlopen retries = retries.increment( File ""/var/task/urllib3/util/retry.py"", line 525, in increment raise six.reraise(type(error), error, _stacktrace) File ""/var/task/urllib3/packages/six.py"", line 770, in reraise raise value File ""/var/task/urllib3/connectionpool.py"", line 714, in urlopen httplib_response = self._make_request( File ""/var/task/urllib3/connectionpool.py"", line 403, in _make_request self._validate_conn(conn) File ""/var/task/urllib3/connectionpool.py"", line 1053, in _validate_conn conn.connect() File ""/var/task/urllib3/connection.py"", line 363, in connect self.sock = conn = self._new_conn() File ""/var/task/urllib3/connection.py"", line 186, in _new_conn raise NewConnectionError( urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f9445efe9d0>: Failed to establish a new connection: [Errno 110] Connection timed out

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File ""/var/task/botocore/endpoint.py"", line 281, in _do_get_response http_response = self._send(request) File ""/var/task/botocore/endpoint.py"", line 377, in _send return self.http_session.send(request) File ""/var/task/botocore/httpsession.py"", line 493, in send raise EndpointConnectionError(endpoint_url=request.url, error=e) botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: ""https://scheduler.us-east-1.amazonaws.com/schedules/C329_J331_JFNone_A624_AD30418""" "File ""/var/task/urllib3/util/retry.py"", line 525, in increment " "raise six.reraise(type(error), error, _stacktrace) " "File ""/var/task/urllib3/packages/six.py"", line 770, in reraise " "raise value " "File ""/var/task/urllib3/connectionpool.py"", line 714, in urlopen " "httplib_response = self._make_request( " "File ""/var/task/urllib3/connectionpool.py"", line 403, in _make_request " "self._validate_conn(conn) " "File ""/var/task/urllib3/connectionpool.py"", line 1053, in _validate_conn " "conn.connect() " "File ""/var/task/urllib3/connection.py"", line 363, in connect " "self.sock = conn = self._new_conn() " "File ""/var/task/urllib3/connection.py"", line 186, in _new_conn " "raise NewConnectionError( " "urllib3.exceptions.NewConnectionError: <botocore.awsrequest.AWSHTTPSConnection object at 0x7f9445efe9d0>: Failed to establish a new connection: [Errno 110] Connection timed out " "During handling of the above exception, another exception occurred: " "Traceback (most recent call last): " "File ""/var/task/botocore/endpoint.py"", line 281, in _do_get_response " "http_response = self._send(request) " "File ""/var/task/botocore/endpoint.py"", line 377, in _send " "return self.http_session.send(request) " "File ""/var/task/botocore/httpsession.py"", line 493, in send " "raise EndpointConnectionError(endpoint_url=request.url, error=e) " "botocore.exceptions.EndpointConnectionError: Could not connect to the endpoint URL: ""https://scheduler.us-east-1.amazonaws.com/schedules/C329_J331_JFNone_A624_AD30418"" "

Expected Behavior

It should create a request and create a scheduler in the EventBridge

Current Behavior

It is showing the error, while Debug mode is on.

Reproduction Steps

Create a Lambda function which creates a Scheduler in the EventBridge. In few of the calls it may fails to complete due to internal reason

Sample code

Intially the client is created without the config, after few post I have added a config if it solves the issue client = boto3.client("scheduler", config=config)

Config object config = botocore.config.Config( read_timeout=900, connect_timeout=900, retries={"max_attempts": 2, "mode": "standard"}, )

response = client.create_schedule( ActionAfterCompletion="DELETE", Description="Scheduler", FlexibleTimeWindow={"Mode": "OFF"}, Name=name, ScheduleExpression=schduler_time, ScheduleExpressionTimezone="UTC", State=status, Target={ "Arn": Config.LAMBDA_ARN, "Input": json.dumps(schduler_data), "RoleArn": Config.SCHEDULER_ROLE, }, )

In this it is showing the above error

Possible Solution

No response

Additional Information/Context

We are using Python 3.9 Memory: 128 MB Ephimeral storage: 512 MB

We are kept Lambda timeout duration from 3 seconds to 1 minute for the testing

SDK version used

1.28.36

Environment details (OS name and version, etc.)

Lambda with Python 3.9

tim-finnigan commented 3 months ago

Thanks for reaching out. If this only occurs in Lambda then the issue may be with your Lambda configurations. Have you tried increasing the timeout as documented here: https://docs.aws.amazon.com/lambda/latest/dg/configuration-timeout.html ?

hardikdgsa commented 3 months ago

For the testing of the same reason, I have increased the time to 1 minute and 30 seconds for the testing. The process or error comes in 15–20 seconds only, and for the remaining seconds it stays idle and then a timeout message comes.

It is a random behavior

tim-finnigan commented 3 months ago

@hardikdgsa thanks for following up. A few other suggestions:

  1. Try a Lambda runtime with a newer version of Python: https://docs.aws.amazon.com/lambda/latest/dg/lambda-runtimes.html#runtimes-supported
  2. Increase the Lambda function's memory capacity: https://docs.aws.amazon.com/lambda/latest/dg/configuration-memory.html
  3. Increase connect_timeout via botocore configuration, documented here: https://botocore.amazonaws.com/v1/documentation/api/latest/reference/config.html
  4. Increase the max_attempts in your retry configuration, documented here: https://boto3.amazonaws.com/v1/documentation/api/latest/guide/retries.html

Please let us know what else you have tried, and if you can provide a code snippet to reproduce the issue and the full logs (with any sensitive info redacted) by adding boto3.set_stream_logger('') to your script.

github-actions[bot] commented 2 months ago

Greetings! It looks like this issue hasn’t been active in longer than five days. We encourage you to check if this is still an issue in the latest release. In the absence of more information, we will be closing this issue soon. If you find that this is still a problem, please feel free to provide a comment or upvote with a reaction on the initial post to prevent automatic closure. If the issue is already closed, please feel free to open a new one.