boto / botocore

The low-level, core functionality of boto3 and the AWS CLI.
Apache License 2.0
1.44k stars 1.06k forks source link

fix: tcp_keepalive socket #3140

Open ShaneNolan opened 3 months ago

ShaneNolan commented 3 months ago

When using the botocore.config.Config option tcp_keepalive=True, the TCP socket is configured with the keep alive socket option (socket.SO_KEEPALIVE). By default, Linux sets the TCP keepalive time parameter to 7200 seconds, which exceeds the AWS NAT Gateway default timeout of 350 seconds [source].

This limitation leads to an inability to receive a response from a Lambda function under the following conditions:

Therefore, by configuring socket.TCP_KEEPIDLE, socket.TCP_KEEPINTVL and socket.TCP_KEEPCNT when tcp_keepalive during the _compute_socket_options function call we can overcome this limitation.

socket.IPPROTO_TCP is used to support cross platform compatibility.

The code submitted automatically calculates these values based on the read timeout. Another option would be to have supplied in the scope/client object.

Fixes issues: https://github.com/boto/boto3/issues/2424, https://github.com/boto/boto3/issues/2510 and https://github.com/boto/botocore/issues/2916.

Fargate recently had a similar solution implemented to support this use case: https://aws.amazon.com/blogs/containers/announcing-additional-linux-controls-for-amazon-ecs-tasks-on-aws-fargate/.