aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.66k stars 3.92k forks source link

custom_resources: AwsCustomResource deadlocks sporadically #29026

Open rittneje opened 9 months ago

rittneje commented 9 months ago

Describe the bug

We are using AwsCustomResource to execute s3:deleteBucketInventoryConfiguration on delete. We have observed sporadic failures that cause CloudFormation to end up hitting its one hour timeout because the custom resource never responds.

Looking in the logs for the custom resource lambda that CDK autogenerates, we see the task repeatedly time out after 120 seconds. There are no other logs. As per CloudTrail, the call to S3 is never even made.

Expected Behavior

The custom resource should work.

Current Behavior

It sporadically gets stuck and never responds.

Reproduction Steps

custom_resources.AwsCustomResource(
    scope,
    id,
    install_latest_aws_sdk=True,
     policy=custom_resources.AwsCustomResourcePolicy.from_statements([
        iam.PolicyStatement(
            actions=["s3:PutInventoryConfiguration"],
            resources=[<bucket arn>],
        ),
    ]),
    on_delete={
        "service": "S3",
        "action": "deleteBucketInventoryConfiguration",
        "parameters": {
            "Bucket": <bucket name>,
            "Id": <id>,
        }
    }
)

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.118.0 (build a40f2ec)

Framework Version

No response

Node.js Version

v20.10.0

OS

Alpine 3.18

Language

Python

Language Version

3.19.0

Other information

No response

pahud commented 9 months ago

Can you share the lambda logs?

We need to make sure if it just doesn't invoke the SDK call or it does and just times out. If it does and just times out, you probably will need to use the Provider Framework instead, which allows you to specify some timeout options.

rittneje commented 9 months ago

@pahud Here are the logs from CloudWatch.

@timestamp @message
2024-02-08 04:13:10.262 2024-02-08T04:13:10.262Z bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Task timed out after 120.11 seconds
2024-02-08 04:13:10.262 END RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86
2024-02-08 04:13:10.262 REPORT RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Duration: 120113.10 ms Billed Duration: 120000 ms Memory Size: 128 MB Max Memory Used: 128 MB
2024-02-08 04:11:10.403 2024-02-08T04:11:10.403Z bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 INFO Installing latest AWS SDK v3: @aws-sdk/client-s3
2024-02-08 04:11:10.148 START RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Version: $LATEST
2024-02-08 04:09:05.980 2024-02-08T04:09:05.980Z bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Task timed out after 120.29 seconds
2024-02-08 04:09:05.980 END RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86
2024-02-08 04:09:05.980 REPORT RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Duration: 120286.42 ms Billed Duration: 120000 ms Memory Size: 128 MB Max Memory Used: 128 MB
2024-02-08 04:07:05.899 2024-02-08T04:07:05.899Z bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 INFO Installing latest AWS SDK v3: @aws-sdk/client-s3
2024-02-08 04:07:05.692 START RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Version: $LATEST
2024-02-08 04:06:04.527 2024-02-08T04:06:04.526Z bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Task timed out after 122.10 seconds
2024-02-08 04:06:04.527 END RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86
2024-02-08 04:06:04.527 REPORT RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Duration: 122101.35 ms Billed Duration: 120000 ms Memory Size: 128 MB Max Memory Used: 128 MB Init Duration: 194.53 ms
2024-02-08 04:04:02.574 2024-02-08T04:04:02.574Z bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 INFO Installing latest AWS SDK v3: @aws-sdk/client-s3
2024-02-08 04:04:02.423 START RequestId: bdc84f65-7abf-490b-9b7d-1d7bf25e9e86 Version: $LATEST
pahud commented 5 months ago
Task timed out after 120.29 seconds

You may need to increase the lambda timeout or use the custom resource provider with the isComplete handler to check that status.

rittneje commented 5 months ago

@pahud as I mentioned, it never even made the call.