Add API to register and execute code before or after CDK App lifecycle events

The AWS CDK RFC #228 and AWS-CDK issues 11344, 20455 all proposed the idea of running custom code that could be invoked from the CLI or CDK App before or after specific CDK App lifecycle events. These feature requests resemble the hooks feature of the Sceptre project and OpenShift platform hooks.

The consensus is that CDK Triggers, as defined in RFC #71, should address this feature gap. In many environments, CDK triggers will work well. However, in regulated environments, this approach does not work. In addition, the reliance on custom resources presents several logistical challenges regarding regulatory compliance, making it unusable for organizations that must participate in regulatory or compliance programs. Therefore, a key distinction for what is being requested in this RFC is that this trigger or hooks code is executed within the CDK process and does not rely on deploying resources to an AWS account to function.

Use cases for pre and post-hooks during the CDK application lifecycle

Today, our team validates the generated CloudFormation templates and asset manifest pre-deployment. We are required to ensure compliance before anything is deployed. Additionally, we run code to inspect the CDK asset manifest to ensure that all code assets are pre-built and referenced from internal artifact repositories (we have custom constructs to do to copy Lambda ZIP archives from an internal repo, and another that uses Skopeo to copy container images). These checks are run in a managed AWS Code Pipeline that executes in a separate account from the target deployment account. The Pipeline account often has entitlements and access to different resources than the deployment account and vice versa. Thus, a custom resource running in the deployment account may be unable to perform the desired activity. Moreover, it may not be appropriate for it to do so.

We perform these checks directly after synthesizing the CDK app. Developers can even perform these checks locally in their IDEs before pushing code. This affords rapid feedback, and teams know what will pass well before deployment. However, the user experience is disjointed as it runs outside the CDK lifecycle. Ideally, we could plug into the CDK lifecycle pre or post-synthesis or even pre and post-deploy.

Ideally, we'd like to have the opportunity to inspect the deployment environment pre-deploy to do things like:

Run the full suite of compliance checks post-synthesis
Remove orphaned stacks - this is questionable to do via CDK Triggers as it requires a deployment, and you'd be entitling code that runs in the deployment environment that can delete stacks
Run assertions against the deployment account to ensure that expected pre-conditions are correct.

Proposed solution

Rather than rely solely on CDK Triggers or Custom Resources, developers could tap into the CDK App lifecycle and execute code before or after one of the lifecycle events.


const app = new cdk.App();

// Check the asset manifest for compliance
app.registerPostSynthHook(new AssetMansifestInspectorHook(...));
// check that the generated CFN passes all compliance checks,
// otherwise fail, the deployment
app.registerPostSynthHook(new ComplianceHook(...));

// Ensure that some of the core expectations about 
// the target environment is true.
app.registerPreDeployHook(new PreConditionHook(...));

// Clean up a stack that has been removed from the CDK app
app.registerPostDeployHook(
    new DeleteOrphanedStackHook(["MyOrphanedStack]));

// Uploads the results of the compliance checks
app.registerPostDeployHook(
    new UploadComplianceEvidenceHook(...));

The hooks should be limited to either an entire CDK App or Stack. In addition, these lifecycle hooks must provide access to the cxapi.CloudAssembly and related types to be able to perform an inspection on the generated templates and asset manifests.

What's wrong with CDK Triggers or CDK Custom Resources?

The challenge with CDK Triggers is that it relies on using Custom Resources. Custom resources present many problems regarding regulated environments such as payment platforms, telecom, fintech, healthcare, etc. Regulators and internal InfoSec teams follow the same essential guidance in that:

The Lambda function must be attached to a VPC. This is also guidance from AWS
The integrity of the code being deployed must be verifiable
Evidence of vulnerability scanning for the deployed code
Authenticity of the author is not easily verifiable. This is especially true for 3rd party custom resource-based constructs. We'd be allowing any NPM package to deploy a lambda function into our environment, which has access to the public internet by default.

CDK implements triggers using a custom resource lambda to invoke the actual trigger. Ironically, while the trigger function can specify a VPC, the CDK Trigger resource lambda function cannot. When detective controls for resources that are out of compliance execute, every custom resource will generate multiple violations.

Of equal importance is the source code that is being deployed. The CDK trigger lambda gets deployed from the source code distributed within an NPM package. A key concern here is that there is implicit trust in deploying source code from a random 3rd party into an account where no vetting is performed. For AWS CDK types, this might be okay, but the risk increases with any non-CDK package. Several compliance programs demand evidence that the executable code deployed into an environment has traceability guarantees. The type of evidence that frequently is required:

If it's source code, it must be traceable to a specific SHA commit.
If it's 3rd party code, it must be traceable back to a specific version in an internal artifact repo
There must be evidence that the code artifact integrity is intact. This is typically done using checksum verification and/or artifact signing.
There is evidence that the code has passed some vulnerability scans pre-deployment. I am qualifying pre-deployment here, as the container must be scanned before being pushed to ECR.

Our experience with CDK custom resources is that they cannot satisfy these requirements in their current form.

Finally, the last item is that of entitlements. Our deployment pipelines are split across two accounts partly to satisfy the separation of duties requirements. The Pipeline accounts have the entitlements to deploy and mutate resources in the deployment account. The Trigger/Custom Resource approach assumes that the deployment account has the same entitlements and network access as the pipeline account.

Why not use CDK Aspects, like cdk-nag?

CDK Aspects have a lot of utility, and we use them heavily in our projects. However, CDK Aspects won't access to the generated CloudFormation and asset manifests as they are evaluated before synthesis. These files are required for our compliance checks.

aws / aws-cdk-rfcs