aws / aws-cdk-rfcs

RFCs for the AWS CDK
Apache License 2.0
530 stars 83 forks source link

Add API to register and execute code before or after CDK App lifecycle events #489

Open damnhandy opened 1 year ago

damnhandy commented 1 year ago

The AWS CDK RFC #228 and AWS-CDK issues 11344, 20455 all proposed the idea of running custom code that could be invoked from the CLI or CDK App before or after specific CDK App lifecycle events. These feature requests resemble the hooks feature of the Sceptre project and OpenShift platform hooks.

The consensus is that CDK Triggers, as defined in RFC #71, should address this feature gap. In many environments, CDK triggers will work well. However, in regulated environments, this approach does not work. In addition, the reliance on custom resources presents several logistical challenges regarding regulatory compliance, making it unusable for organizations that must participate in regulatory or compliance programs. Therefore, a key distinction for what is being requested in this RFC is that this trigger or hooks code is executed within the CDK process and does not rely on deploying resources to an AWS account to function.

Use cases for pre and post-hooks during the CDK application lifecycle

Today, our team validates the generated CloudFormation templates and asset manifest pre-deployment. We are required to ensure compliance before anything is deployed. Additionally, we run code to inspect the CDK asset manifest to ensure that all code assets are pre-built and referenced from internal artifact repositories (we have custom constructs to do to copy Lambda ZIP archives from an internal repo, and another that uses Skopeo to copy container images). These checks are run in a managed AWS Code Pipeline that executes in a separate account from the target deployment account. The Pipeline account often has entitlements and access to different resources than the deployment account and vice versa. Thus, a custom resource running in the deployment account may be unable to perform the desired activity. Moreover, it may not be appropriate for it to do so.

We perform these checks directly after synthesizing the CDK app. Developers can even perform these checks locally in their IDEs before pushing code. This affords rapid feedback, and teams know what will pass well before deployment. However, the user experience is disjointed as it runs outside the CDK lifecycle. Ideally, we could plug into the CDK lifecycle pre or post-synthesis or even pre and post-deploy.

Ideally, we'd like to have the opportunity to inspect the deployment environment pre-deploy to do things like:

Proposed solution

Rather than rely solely on CDK Triggers or Custom Resources, developers could tap into the CDK App lifecycle and execute code before or after one of the lifecycle events.


const app = new cdk.App();

// Check the asset manifest for compliance
app.registerPostSynthHook(new AssetMansifestInspectorHook(...));
// check that the generated CFN passes all compliance checks,
// otherwise fail, the deployment
app.registerPostSynthHook(new ComplianceHook(...));

// Ensure that some of the core expectations about 
// the target environment is true.
app.registerPreDeployHook(new PreConditionHook(...));

// Clean up a stack that has been removed from the CDK app
app.registerPostDeployHook(
    new DeleteOrphanedStackHook(["MyOrphanedStack]));

// Uploads the results of the compliance checks
app.registerPostDeployHook(
    new UploadComplianceEvidenceHook(...));

The hooks should be limited to either an entire CDK App or Stack. In addition, these lifecycle hooks must provide access to the cxapi.CloudAssembly and related types to be able to perform an inspection on the generated templates and asset manifests.

What's wrong with CDK Triggers or CDK Custom Resources?

The challenge with CDK Triggers is that it relies on using Custom Resources. Custom resources present many problems regarding regulated environments such as payment platforms, telecom, fintech, healthcare, etc. Regulators and internal InfoSec teams follow the same essential guidance in that:

  1. The Lambda function must be attached to a VPC. This is also guidance from AWS
  2. The integrity of the code being deployed must be verifiable
  3. Evidence of vulnerability scanning for the deployed code
  4. Authenticity of the author is not easily verifiable. This is especially true for 3rd party custom resource-based constructs. We'd be allowing any NPM package to deploy a lambda function into our environment, which has access to the public internet by default.

CDK implements triggers using a custom resource lambda to invoke the actual trigger. Ironically, while the trigger function can specify a VPC, the CDK Trigger resource lambda function cannot. When detective controls for resources that are out of compliance execute, every custom resource will generate multiple violations.

Of equal importance is the source code that is being deployed. The CDK trigger lambda gets deployed from the source code distributed within an NPM package. A key concern here is that there is implicit trust in deploying source code from a random 3rd party into an account where no vetting is performed. For AWS CDK types, this might be okay, but the risk increases with any non-CDK package. Several compliance programs demand evidence that the executable code deployed into an environment has traceability guarantees. The type of evidence that frequently is required:

Our experience with CDK custom resources is that they cannot satisfy these requirements in their current form.

Finally, the last item is that of entitlements. Our deployment pipelines are split across two accounts partly to satisfy the separation of duties requirements. The Pipeline accounts have the entitlements to deploy and mutate resources in the deployment account. The Trigger/Custom Resource approach assumes that the deployment account has the same entitlements and network access as the pipeline account.

Why not use CDK Aspects, like cdk-nag?

CDK Aspects have a lot of utility, and we use them heavily in our projects. However, CDK Aspects won't access to the generated CloudFormation and asset manifests as they are evaluated before synthesis. These files are required for our compliance checks.

argenstijn commented 8 months ago

Any idea when this is planned?