aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.55k stars 3.87k forks source link

(custom-resources): Package does not exist #30067

Open athewsey opened 4 months ago

athewsey commented 4 months ago

Describe the bug

I'm trying to use AwsCustomResource from Python for a couple of actions on @aws-sdk/client-cognito-identity-provider, and deployment keeps failing with errors like:

Received response status [FAILED] from custom resource. Message returned:
Package @aws-sdk/client-cognito-identity-provider does not exist. (RequestId: 99b79a89-1a17-4acf-864c-84b3ac3e5664)

Expected Behavior

The affected resource (see repro steps below) should deploy successfully and create a user in the provided Cognito user pool.

Current Behavior

I'm getting the above mentioned error message and the resource fails to create (or rollback/delete). Also tried providing the service name as CognitoIdentityServiceProvider but this gave the same error message (with @aws-sdk/client-cognito-identity-provider package name)

Possibly this may be intermittent, as I managed to get the stack to deploy (update existing to add this resource) at least once? But now facing the error consistently.

Reproduction Steps

Given Python CDK construct with a resource something like:

AwsCustomResource(
    self,
    "AwsCustomResource-CreateUser",
    on_create=AwsSdkCall(
        action="adminCreateUser",
        parameters={
            "UserPoolId": ...,
            "Username": ...,
            "MessageAction": "SUPPRESS",
            "TemporaryPassword": ...,
        },
        physical_resource_id=PhysicalResourceId.of(
            f"AwsCustomResource-CreateUser-{...}"
        ),
        service="@aws-sdk/client-cognito-identity-provider",
    ),
    on_delete=AwsSdkCall(
        action="adminDeleteUser",
        parameters={
            "UserPoolId": ...,
            "Username": ...,
        },
        service="@aws-sdk/client-cognito-identity-provider",
    ),
    policy=AwsCustomResourcePolicy.from_sdk_calls(
        resources=AwsCustomResourcePolicy.ANY_RESOURCE
    ),
    install_latest_aws_sdk=True,
)

...Try to deploy the stack

Possible Solution

🤷‍♂️

Additional Information/Context

Originally observed on CDK v1.126.0, so tried upgrading to 2.140.0 but it didn't help.

CDK CLI Version

2.140.0

Framework Version

2.140.0

Node.js Version

20.9.0

OS

macOS 14.4.1

Language

Python

Language Version

Python 3.12.1

Other information

Seems possibly related to #28005, which was closed due to inactivity but raised against an older CDK version.

glitchassassin commented 4 months ago

As of 11:00 EST on 5/3, we have been seeing a similar error with Python 3.10, CDK 2.134.0, using an AwsSdkCall for SSM's getParameter action. In our case the error is Package @aws-sdk/client-ssm does not exist.

cr.AwsCustomResource(
    self,
    "get_parameter",
    on_update=cr.AwsSdkCall(
        service="SSM",
        action="getParameter",
        parameters={
            "Name": parameter_name,
            "WithDecryption": True,
        },
        physical_resource_id=cr.PhysicalResourceId.of(
            str(datetime.utcnow()),
        ),
        region=region,
    ),
    policy=cr.AwsCustomResourcePolicy.from_sdk_calls(
        resources=[
            Stack.of(self).format_arn(
                service="ssm",
                region=region,
                resource="parameter",
                resource_name=parameter_name.lstrip("/"),
            )
        ]
    ),
)

The issue also appears to be intermittent for us.

athewsey commented 4 months ago

For now, un-setting install_latest_aws_sdk seems to have stabilized our configuration (based on ~3 repeated deployments)... But I feel like it might be an intermittency thing / luck-of-the-draw, rather than a real remedy. Our full source code & patch commit available here

@glitchassassin it looks like you're not using the install_latest_aws_sdk option though right? And still seeing the issue?

glitchassassin commented 4 months ago

Correct, we are not.

On Friday, it failed on 2/6 deploys. Today we've had four successful releases so far and no failures. I'm configuring logging on the AwsSdkCall in hopes of capturing more details if it happens again

glitchassassin commented 4 months ago

Aha, tracked down some logs from Friday! They showed up by default in a Cloudwatch log group named /aws/lambda/[stack_name]-AWS[random hexadecimal]

Installing latest AWS SDK v3: @aws-sdk/client-ssm Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: SyntaxError: Error parsing /tmp/node_modules/@smithy/shared-ini-file-loader/package.json: Unexpected end of JSON input

In another instance:

Installing latest AWS SDK v3: @aws-sdk/client-ssm Failed to install latest AWS SDK v3. Falling back to pre-installed version. Error: Error: Cannot find module '@smithy/shared-ini-file-loader' Require stack:

  • /tmp/node_modules/@smithy/node-config-provider/dist-cjs/index.js
  • /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/adaptors/getEndpointFromConfig.js
  • /tmp/node_modules/@smithy/middleware-endpoint/dist-cjs/index.js
  • /tmp/node_modules/@smithy/core/dist-cjs/index.js
  • /tmp/node_modules/@aws-sdk/client-ssm/dist-cjs/index.js
  • /var/task/index.js
  • /var/runtime/index.mjs

It seems like each time this runs, there's an initial attempt to install the SDK which always times out after 120 seconds (based on ResourceProperties in the logs, InstallLatestAwsSdk is true even though it isn't explicitly set in our code). The lambda is immediately invoked again, and this time the install either succeeds or fails in under a minute. If it fails, it says it is falling back to pre-installed version.

After the install, an Update request is logged, and it returns the parameter it's supposed to be fetching correctly (whether the install failed or succeeded).

Then, in some cases, there is a second Update request in the logs a couple minutes later, and that is where the "Package does not exist" error gets thrown. The request is identical to the first Update request except that the physicalResourceId is different (it's using the current date/time as described here.)

After reviewing our deployment logs, this seems to only have happened when we had back-to-back deployments within a couple minutes of each other, so the second deployment's Update request hits the same running lambda instance that was created by the first deployment.

It looks like when the Lambda doesn't get cleaned up after an install failure, the next Update request fails.

glitchassassin commented 4 months ago

Based on this:

https://github.com/aws/aws-cdk/blob/8e98078a54896b7a9531ba4b11bb0c6221383e34/packages/%40aws-cdk/custom-resource-handlers/lib/custom-resources/aws-custom-resource-handler/aws-sdk-v3-handler.ts#L24-L57

I wonder if the initial npm install failure is leaving /tmp/node_modules in an invalid state, but a subsequent npm install fails to detect the issue and thinks everything is installed?

Nope! It's actually failing on the require, not on the npm install command. So at this point installedSdk[packageName] is true. Next time it runs, the handler skips trying to install and falls through to the next block on the if statement:

https://github.com/aws/aws-cdk/blob/8e98078a54896b7a9531ba4b11bb0c6221383e34/packages/%40aws-cdk/custom-resource-handlers/lib/custom-resources/aws-custom-resource-handler/aws-sdk-v3-handler.ts#L59-L66

But there's no try/catch here, so this time when the require fails, it doesn't fall back to the pre-installed version.

glitchassassin commented 4 months ago

Drafting a PR with a fix

khushail commented 4 months ago

thanks @athewsey for reporting this issue. There have been multiple incidences of this issue reported by the customers recently

Thanks @glitchassassin for submitting a PR.

ofiriluz commented 4 months ago

Hi, any update on this? we have started getting this as well when deleting Events Rule for some reason "Package @aws-sdk/client-cloudwatch-logs does not exist"

This is holding our pipelines right now from fully passing

glitchassassin commented 4 months ago

Waiting on some guidance on the failing integration tests on the PR - I'm not sure how to resolve the build issues

i-am-gg commented 4 months ago

@glitchassassin I see that the PR is still open, this is also affecting our deployments, when is this expected to get merged ? And is there any workaround for the same for now ?

glitchassassin commented 4 months ago

@gg-safe I am still working on getting this merged!

I think the workaround for now is to set install_latest_aws_sdk to false

i-am-gg commented 4 months ago

Thanks for the workaround @glitchassassin , this seems to be working, will test more. Thanks a lot again !!!

emmanuelnk commented 3 months ago

Hi, is there any movement on this? Our deployments (with custom resources) are failing for the exact same issue. In my case they fail regardless of the value of install_latest_aws_sdk with the following message:

 Received response status [FAILED] from custom resource. Message returned: Package @aws-sdk/client-r53 does not exist.
glitchassassin commented 3 months ago

I've been working through the PR issues with pahud in the CDK Slack; I've cross-posted the latest question on the PR for visibility

emmanuelnk commented 3 months ago

Thank you @glitchassassin -- if there is anything I can do to help move this along just link me to the slack discussion (I'm also in that slack group)

ethanr-bjss commented 3 months ago

Hi @glitchassassin, has there been any progress on this? This is currently blocking some of our Production workflows.

If there's anything I can help with to speed this along, please let me know.

glitchassassin commented 3 months ago

@ethanr-bjss It looks like the PR that I was waiting on has been merged, so we should be good to update the integration test snapshots. I'll get started on those now!

anubhav-pandey1 commented 1 week ago

+1 to the issue, I am facing this issue or probably a similar issue for @aws-sdk/client-elasticloadbalancingv2 on creating a custom resource.

Error message: Package @aws-sdk/client-elasticloadbalancingv2 does not exist. (RequestId: 8e210cb-fbc4-41fe-89cd-bebf8b93d075)