Closed frfavoreto closed 6 months ago
@frfavoreto , thanks for reaching out. Team is already Tracking this and working on it.
I'm also running into this, but in my case, the custom resource never succeeds.
Please note that this is currently causing deployments to fail for custom resources that that use installLatestAwsSdk: true
. The resources are timing out and/or failing when the lambda falls back to using the older SDK. Some resources may succeed upon retrying, but if you have multiple custom resources, then it's likely that at least one of them will fail during each deployment.
@frfavoreto @jakekarnes42 What version of the CDK are you using and do you know what version you started encountering this in?
@colifran Today I upgraded from CDK 2.131.0 to 2.138.0 (the current latest version) and that's when the issue began. Previously successful deployments began to fail. Investigating the failures came to exact conclusion shared by @frfavoreto in the original issue description.
Each deployment contains about 15 custom resources which would fail intermittently. Upon reviewing the Lambda logs, it appears that sometimes the Lambda would succeed after falling back to the default SDK. Infrequently, it would timeout before successfully falling back. That could cause the Custom Resource update to fail, and cascade the failure to the rest of the deployment. Since I'm deploying multiple custom resources, I found that at least one would fail on each deployment attempt.
I rolled back to CDK 2.131.0 and the issue is no longer present. I'm back to successful deployments. This appears to be regression sometime between those two versions.
I hope this helps and thanks for the quick support!
@jakekarnes42 Thanks for the clarification. What is strange is that I have also gone back to 2.131.0, but I'm still getting the time out when trying to install the latest SDK version. I've looked through some of our recent changes and I'm not seeing anything that would make me think this is something on the CDK side. I'm wondering if this could be an SDK related issue? We will continue to investigate / monitor!
@frfavoreto @jakekarnes42 I did some more testing and it appears that the 2 minute default timeout that is set for AwsCustomResource
is no longer sufficient for installing the latest SDK version. I set the default timeout to 5 minutes and this fixed the timeout issue for me. It looks like it took close to 4 minutes to install the latest SDK version. Would one of you be able to try this out on your end? I'll continue testing this on my end.
@colifran When I increase timeout
settings I have the same results you described. Now I believe it might be rather an issue with Lambda, not sure.
It happens to any aws-cdk-lib version that provision functions with Nodejs18.x.
I can confirm that increasing the timeout fixed this issue for me, too.
This should be way more visible that it is tbh - it causes all my custom resources to fail, increasing the timeout is an option but very costly time wise. Since this is not fixed yest i just set the install_latest_aws_sdk=False
as suggested above. This thing costed my multiple hours of hair pulling today :)
The source code where timeout happens after 120 seconds.
This should be way more visible that it is tbh - it causes all my custom resources to fail, increasing the timeout is an option but very costly time wise. Since this is not fixed yest i just set the
install_latest_aws_sdk=False
as suggested above. This thing costed my multiple hours of hair pulling today :)
The installation of the sdk is a direct call to npm so we have no control over the latency here. If npm is experiencing increased latency in their downloads, we can only mitigate that by providing a version of the sdk that we know is safe to use. I did also do a quick check to see if the asset size for the sdk significantly increased recently and it looks like it has not.
There is definitely room to improve the documentation here to specify WHY you might want to increase this timeout, but there is no fix here because there is not actually a bug (on our end, there may be an issue with npm, the sdk, or somewhere else).
If increasing the timeout is too costly, then changing this setting to false is the right way to go.
I'm going to suspect there's something going on with the SDK and/or npm here. On https://www.npmjs.com/package/@aws-sdk/client-s3 it says the most recent version is from 9 days ago and lists 3.556.0 as the most up-to-date version. On https://github.com/aws/aws-sdk-js-v3 latest is 3.562.0 published 12 hours ago.
On aws/aws-sdk-js-v3 latest is 3.562.0 published 12 hours ago.
This is a global version of the AWS SDK for JavaScript. We only publish the modules which are updated in a version, but keep the version number same for easy comparison. During dev-preview, we'd followed independent versioning which had caused confusion among users. Users also complained about fixed versioning for all modules, where new version doesn't have any update. Fixed versioning only when required was a good middle path.
On npmjs.com/package/@aws-sdk/client-s3 it says the most recent version is from 9 days ago and lists 3.556.0
This is correct. There hasn't be any update in @aws-sdk/client-s3
directly (change in service model) or indirectly (update in any of it's dependencies) since v3.556.0. That's why there's not new version published for it.
Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.
Describe the bug
When setting
AwsCustomResource
withinstallLatestAwsSdk: true
it fails to upgrade aws-sdk to latest version (at this momentv3.556.0
) and fails back to the default (currentlyv3.515.0
in Lambda).Examples with DynamoDB:
and SSM modules:
The custom resource eventually succeeds, after failing back to the default aws-sdk.
Expected Behavior
Be able to update the Lambda Nodejs18 runtime with the latest SDKv3 version.
Current Behavior
Unable to retrieve and upgrade Lambda Nodejs18 for SDKv3 latest version
Reproduction Steps
Deploy a CDK App with a Custom Resource that has
installLatestAwsSdk: true
:Check the underlying Lambda Function's logs to see the messages above.
Possible Solution
No response
Additional Information/Context
No response
CDK CLI Version
2.137.0
Framework Version
No response
Node.js Version
18
OS
Mac
Language
TypeScript
Language Version
No response
Other information
No response