aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.68k stars 3.92k forks source link

custom-resource: Custom resource as a dependency for another custom resource #30875

Open kaisic1224 opened 4 months ago

kaisic1224 commented 4 months ago

Describe the bug

I am attempting to create a Neptune global database and then add a database cluster inside.

I am creating the global database using the CustomResource class, and the database cluster with the AwsCustomResource class.

The error that I am running into is that when I try to add a cluster to the global database, the deployment fails at creation as it cannot find the global database even after specifying that the cluster depends on the global database.

Expected Behavior

I expect for the global databse to be fully created and available before the cluster is added.

Current Behavior

11:43:46 PM | CREATE_FAILED | Custom::NeptuneRegionalCluster | NeptuneCluster7FC72740 Received response status [FAILED] from custom resource. Message returned: Global cluster global-database-identifier not found (RequestId: 83af0fa8-cea4-44d1-8ec7-9c2948986ce2)

❌ GlobalDB failed: Error: The stack named GlobalDB failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_FAILED (The following resource(s) failed to delete: [NeptuneCluster7FC72740]. ): Received response status [FAILED] from custom resource. Message returned: Global cluster global-cluster-identifier not found (RequestId: 83af0fa8-cea4-44d1-8ec7-9c2948986ce2), Received response status [FAILED] from custom resource. Message returned: Malformed db cluster arn dev-primary-cluster (RequestId: 2bfb458b-cfde-4169-b730-d8cfc0a258f7) at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:451:10568) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:454:199716) at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:454:181438

❌ Deployment failed: Error: The stack named GlobalDB failed creation, it may need to be manually deleted from the AWS console: ROLLBACK_FAILED (The following resource(s) failed to delete: [NeptuneCluster7FC72740]. ): Received response status [FAILED] from custom resource. Message returned: Global cluster global-cluster-identifier not found (RequestId: 83af0fa8-cea4-44d1-8ec7-9c2948986ce2), Received response status [FAILED] from custom resource. Message returned: Malformed db cluster arn dev-primary-cluster (RequestId: 2bfb458b-cfde-4169-b730-d8cfc0a258f7) at FullCloudFormationDeployment.monitorDeployment (/usr/local/lib/node_modules/aws-cdk/lib/index.js:451:10568) at process.processTicksAndRejections (node:internal/process/task_queues:95:5) at async Object.deployStack2 [as deployStack] (/usr/local/lib/node_modules/aws-cdk/lib/index.js:454:199716) at async /usr/local/lib/node_modules/aws-cdk/lib/index.js:454:181438

Reproduction Steps

neptune.ts

import { Stack } from "aws-cdk-lib";
import { Code, CodeSigningConfig, Runtime } from "aws-cdk-lib/aws-lambda";
import { Platform, SigningProfile } from "aws-cdk-lib/aws-signer";
import { NodejsFunction } from "aws-cdk-lib/aws-lambda-nodejs";

class GlobalDatabaseStack extends Stack {
    globalClusterIdentifier = "global-clulster-identifier"
    engineVersion = "1.2.0.0"

    constructor(scope: App, id: string) {

    const signingProfile = new SigningProfile(this, "SigningProfile", {
      platform: Platform.AWS_LAMBDA_SHA384_ECDSA,
    });

    const codeSigningConfig = new CodeSigningConfig(this, "CodeSigningConfig", {
      signingProfiles: [signingProfile],
    });

    const globalClusterOnEventHandler = new NodejsFunction(
      this,
      "NeptuneGlobalClusterOnEventHandler",
      {
        codeSigningConfig,
        runtime: Runtime.NODEJS_20_X,
        handler: "globalClusterOnEventHandler/globalClusterOnEventHandler.handler",
        code: Code.fromAsset(join(__dirname, "lambda", "globalClusterOnEventHandler", "globalClusterOnEventHandler.zip")),
        bundling: {
          externalModules: ["aws-sdk"],
        },
      }
    );

    const globalClusterProvider = new Provider(
      this,
      "NeptuneGlobalClusterProvider",
      {
        onEventHandler: globalClusterOnEventHandler,
      }
    );

    // create global cluster
    const globalCluster = new CustomResource(this, "NeptuneGlobalDatabase", {
      serviceToken: globalClusterProvider.serviceToken,
      properties: {
        // stack: this,
        GlobalClusterIdentifier: this.globalClusterIdentifier,
        engineVersion: this.engineVersion,
      },
      resourceType: "Custom::NeptuneGlobalCluster",
    });

    // add a cluster in the primary rgion
    const primaryCluster = new AwsCustomResource(this, "NeptuneCluster", {
      onCreate: {
        action: "CreateDBClusterCommand",
        service: "@aws-sdk/client-neptune",
        physicalResourceId: PhysicalResourceId.of(Date.now().toString()),
        parameters: {
          // required
          DBClusterIdentifier: `dev-primary-cluster`,
          Engine: "neptune",

          DatabaseName: `globalDatabase`,
          EngineVersion: this.engineVersion,
          GlobalClusterIdentifier: this.globalClusterIdentifier, // db name
        },
      },
      onDelete: {
        action: "RemoveFromGlobalClusterCommand",
        service: "@aws-sdk/client-neptune",
        parameters: {
          GlobalClusterIdentifier: this.globalClusterIdentifier,
          DbClusterIdentifier: `dev-primary-cluster`,
        },
      },
      resourceType: "Custom::NeptuneRegionalCluster",
      policy: AwsCustomResourcePolicy.fromSdkCalls({
        resources:AwsCustomResourcePolicy.ANY_RESOURCE 
      })
    });

    primaryCluster.node.addDependency(globalCluster);
}

lambda/globalClusterOnEventHandler/globalClusterOnEventHandler.ts

import { AwsCustomResource, AwsCustomResourcePolicy, PhysicalResourceId } from "aws-cdk-lib/custom-resources";
import { CloudFormationCustomResourceEvent, Context } from "aws-lambda";

export const handler = async (
  event: CloudFormationCustomResourceEvent,
  context: Context
) => {
  const { stack, GlobalClusterIdentifier, engineVersion, storageEncrypted } =
    event.ResourceProperties;

  let resp = {
    LogicalResourceId: event.LogicalResourceId,
    StackId: event.StackId,
    RequestId: event.RequestId,
    // PhysicalResourceId: context.functionName,
    Status: "FAILED",
    Reason: "",
    Data: {},
  };

  switch (event.RequestType) {
    case "Create":
      let global;
      try {
        global = new AwsCustomResource(stack, "NeptuneGlobalDatabase", {
          onCreate: {
            action: "CreateGlobalClusterCommand",
            service: "@aws-sdk/client-neptune",
            physicalResourceId: PhysicalResourceId.of(Date.now().toString()),
            parameters: {
              // required
              GlobalClusterIdentifier: GlobalClusterIdentifier, // db name

              Engine: "neptune",
              EngineVersion: engineVersion,
              StorageEncrypted: storageEncrypted,
            },
          },
          policy: AwsCustomResourcePolicy.fromSdkCalls({
            resources: AwsCustomResourcePolicy.ANY_RESOURCE
          })
        });
      } catch (error) {
        resp.Status = "FAILED";
        return resp;
      }

      resp.Status = "SUCCESS";
      resp.Data = {
        cluster: global
      }
      return resp;
    case "Update":
    case "Delete":
};

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.146.0 (build b368c78)

Framework Version

No response

Node.js Version

v20.3.0

OS

Debian GNU/Linux 12 (bookworm) on Windows 10 x86_64 Home 22H2 | Kernel version: 5.15.153.1-microsoft-standard-WSL2

Language

TypeScript

Language Version

No response

Other information

No response

pahud commented 3 months ago

11:43:46 PM | CREATE_FAILED | Custom::NeptuneRegionalCluster | NeptuneCluster7FC72740 Received response status [FAILED] from custom resource. Message returned: Global cluster global-database-identifier not found (RequestId: 83af0fa8-cea4-44d1-8ec7-9c2948986ce2)

Looks like when your custom resource tried to create the regional cluster using the specified global-database-identifier, it could not be found. It's very likely your global cluster was not ready yet.

I would troubleshoot this way:

  1. First, just create the global one using custom resource.
  2. After that custom resource is created. Use JS SDK or AWS CLI to create the regional primary one using that global-database-identifier and see if it works. This ensures it could technically be created using AWS CLI or SDK.
  3. If it works in step 2, you should be able to implement that using the custom resource. The key is you need to make sure the global one is ready before you start creating the regional one. The question is how to make sure the global one is ready. In SDK when you create a cluster, you probably will immediately get a response yet the operation is still ongoing. The trick is you need to define an isComplete handler in CDK to describe that cluster and check if that status is ready. With this design your custom resource would not immediate return completed, instead, only when isComplete handler completes would it return completed. So your dependent regional resources could start provisioning when your global one is really ready and available.

Generally we recommend using L2 or L1 constructs whenever possible unless you really have to use custom resource. But if you really have to use that, I do hope this trick helps. Let me know if it works for you.

kaisic1224 commented 3 months ago

This worked perfectly, thank you!