aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.65k stars 3.91k forks source link

(AwsCustomResource): fails to use latest SDK version #29891

Closed frfavoreto closed 6 months ago

frfavoreto commented 6 months ago

Describe the bug

When setting AwsCustomResource with installLatestAwsSdk: true it fails to upgrade aws-sdk to latest version (at this moment v3.556.0) and fails back to the default (currently v3.515.0 in Lambda).

Examples with DynamoDB:

 INFO   Installing latest AWS SDK v3: @aws-sdk/client-dynamodb
 Task timed out after 120.10 seconds
        .
        .
        .
 INFO   Installing latest AWS SDK v3: @aws-sdk/client-dynamodb
 INFO   Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: Error: Cannot find module 'mnemonist/lru-cache'
 Require stack:
 - /tmp/node_modules/@aws-sdk/endpoint-cache/dist-cjs/index.js
 - /tmp/node_modules/@aws-sdk/middleware-endpoint-discovery/dist-cjs/index.js
 - /tmp/node_modules/@aws-sdk/client-dynamodb/dist-cjs/index.js
 - /var/task/index.js
 - /var/runtime/index.mjs
          .
          .

and SSM modules:

INFO    Installing latest AWS SDK v3: @aws-sdk/client-ssm
Task timed out after 120.16 seconds 
          .
          .
 INFO   Installing latest AWS SDK v3: @aws-sdk/client-ssm
 INFO   Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: Error: Cannot find module '@smithy/shared-ini-file-loader'
Require stack:
- /tmp/node_modules/@smithy/node-config-provider/dist-cjs/index.js
- /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/adaptors/getEndpointFromConfig.js
- /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/index.js
- /tmp/node_modules/@smithy/core/dist-cjs/index.js
- /tmp/node_modules/@aws-sdk/client-ssm/dist-cjs/index.js
- /var/task/index.js
- /var/runtime/index.mjs
          .
          . 

The custom resource eventually succeeds, after failing back to the default aws-sdk.

Expected Behavior

Be able to update the Lambda Nodejs18 runtime with the latest SDKv3 version.

Current Behavior

Unable to retrieve and upgrade Lambda Nodejs18 for SDKv3 latest version

Reproduction Steps

Deploy a CDK App with a Custom Resource that has installLatestAwsSdk: true:

const myTable = new dynamodb.Table(this, 'myTable', {
      partitionKey: {
        name: 'id',
        type: dynamodb.AttributeType.STRING,
      },
      removalPolicy: cdk.RemovalPolicy.DESTROY,
    });

const myCustomResource = new cr.AwsCustomResource(this, 'myCR', {​​
      policy: cr.AwsCustomResourcePolicy.fromSdkCalls({​​
          resources: cr.AwsCustomResourcePolicy.ANY_RESOURCE,
      }​​),
      installLatestAwsSdk: true,
      onCreate: {​​
          service: 'DynamoDB',
          action: 'PutItem',
          parameters: {​​
              Item: {​​
                id: {"S": 'test-value'}
              }​​,
              TableName: myTable.tableName,
          }​​,
          physicalResourceId: cr.PhysicalResourceId.of('myCRphysicalResourceID'),
      }​​,
      onUpdate: {​​
          service: 'DynamoDB',
          action: 'PutItem',
          parameters: {​​
            service: 'DynamoDB',
            action: 'PutItem',
            parameters: {​​
                Item: {​​
                  id: {"S": 'test-value'}
                }​​,
                TableName: myTable.tableName,
            }​​,
          physicalResourceId: cr.PhysicalResourceId.of('myCRphysicalResourceID'),
      }​​,
      },
  }​​);

Check the underlying Lambda Function's logs to see the messages above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.137.0

Framework Version

No response

Node.js Version

18

OS

Mac

Language

TypeScript

Language Version

No response

Other information

No response

khushail commented 6 months ago

@frfavoreto , thanks for reaching out. Team is already Tracking this and working on it.

dtczest commented 6 months ago

I'm also running into this, but in my case, the custom resource never succeeds.

jakekarnes42 commented 6 months ago

Please note that this is currently causing deployments to fail for custom resources that that use installLatestAwsSdk: true. The resources are timing out and/or failing when the lambda falls back to using the older SDK. Some resources may succeed upon retrying, but if you have multiple custom resources, then it's likely that at least one of them will fail during each deployment.

colifran commented 6 months ago

@frfavoreto @jakekarnes42 What version of the CDK are you using and do you know what version you started encountering this in?

jakekarnes42 commented 6 months ago

@colifran Today I upgraded from CDK 2.131.0 to 2.138.0 (the current latest version) and that's when the issue began. Previously successful deployments began to fail. Investigating the failures came to exact conclusion shared by @frfavoreto in the original issue description.

Each deployment contains about 15 custom resources which would fail intermittently. Upon reviewing the Lambda logs, it appears that sometimes the Lambda would succeed after falling back to the default SDK. Infrequently, it would timeout before successfully falling back. That could cause the Custom Resource update to fail, and cascade the failure to the rest of the deployment. Since I'm deploying multiple custom resources, I found that at least one would fail on each deployment attempt.

I rolled back to CDK 2.131.0 and the issue is no longer present. I'm back to successful deployments. This appears to be regression sometime between those two versions.

I hope this helps and thanks for the quick support!

colifran commented 6 months ago

@jakekarnes42 Thanks for the clarification. What is strange is that I have also gone back to 2.131.0, but I'm still getting the time out when trying to install the latest SDK version. I've looked through some of our recent changes and I'm not seeing anything that would make me think this is something on the CDK side. I'm wondering if this could be an SDK related issue? We will continue to investigate / monitor!

colifran commented 6 months ago

@frfavoreto @jakekarnes42 I did some more testing and it appears that the 2 minute default timeout that is set for AwsCustomResource is no longer sufficient for installing the latest SDK version. I set the default timeout to 5 minutes and this fixed the timeout issue for me. It looks like it took close to 4 minutes to install the latest SDK version. Would one of you be able to try this out on your end? I'll continue testing this on my end.

frfavoreto commented 6 months ago

@colifran When I increase timeout settings I have the same results you described. Now I believe it might be rather an issue with Lambda, not sure.

It happens to any aws-cdk-lib version that provision functions with Nodejs18.x.

dtczest commented 6 months ago

I can confirm that increasing the timeout fixed this issue for me, too.

amizer12 commented 6 months ago

This should be way more visible that it is tbh - it causes all my custom resources to fail, increasing the timeout is an option but very costly time wise. Since this is not fixed yest i just set the install_latest_aws_sdk=False as suggested above. This thing costed my multiple hours of hair pulling today :)

trivikr commented 6 months ago

The source code where timeout happens after 120 seconds.

https://github.com/aws/aws-cdk/blob/6fdc4582f659549021a64a4d676fce12fc241715/packages/%40aws-cdk/custom-resource-handlers/lib/custom-resources/aws-custom-resource-handler/aws-sdk-v3-handler.ts#L25-L29

TheRealAmazonKendra commented 6 months ago

This should be way more visible that it is tbh - it causes all my custom resources to fail, increasing the timeout is an option but very costly time wise. Since this is not fixed yest i just set the install_latest_aws_sdk=False as suggested above. This thing costed my multiple hours of hair pulling today :)

The installation of the sdk is a direct call to npm so we have no control over the latency here. If npm is experiencing increased latency in their downloads, we can only mitigate that by providing a version of the sdk that we know is safe to use. I did also do a quick check to see if the asset size for the sdk significantly increased recently and it looks like it has not.

There is definitely room to improve the documentation here to specify WHY you might want to increase this timeout, but there is no fix here because there is not actually a bug (on our end, there may be an issue with npm, the sdk, or somewhere else).

If increasing the timeout is too costly, then changing this setting to false is the right way to go.

TheRealAmazonKendra commented 6 months ago

I'm going to suspect there's something going on with the SDK and/or npm here. On https://www.npmjs.com/package/@aws-sdk/client-s3 it says the most recent version is from 9 days ago and lists 3.556.0 as the most up-to-date version. On https://github.com/aws/aws-sdk-js-v3 latest is 3.562.0 published 12 hours ago.

trivikr commented 6 months ago

On aws/aws-sdk-js-v3 latest is 3.562.0 published 12 hours ago.

This is a global version of the AWS SDK for JavaScript. We only publish the modules which are updated in a version, but keep the version number same for easy comparison. During dev-preview, we'd followed independent versioning which had caused confusion among users. Users also complained about fixed versioning for all modules, where new version doesn't have any update. Fixed versioning only when required was a good middle path.

On npmjs.com/package/@aws-sdk/client-s3 it says the most recent version is from 9 days ago and lists 3.556.0

This is correct. There hasn't be any update in @aws-sdk/client-s3 directly (change in service model) or indirectly (update in any of it's dependencies) since v3.556.0. That's why there's not new version published for it.

github-actions[bot] commented 6 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws-cdk-automation commented 3 months ago

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.