aws / aws-cdk-rfcs

RFCs for the AWS CDK
Apache License 2.0
536 stars 82 forks source link

Defaults & configuration policy #25

Closed eladb closed 11 months ago

eladb commented 4 years ago
PR Champion
- @eladb

(See #163 for a previous attempt at an RFC.)

Description

Allow users to specify or install some code that controls the defaults of AWS resources or applies some policy or validation at the stack or app level.

Progress

eladb commented 4 years ago

Design concept:

We add a method to Stack with the following implementation (name is meh):

public getPropsForResource(className: string, props: any): any {
  return props;
}

Then, we somehow enforce that all L2s will call this method when they are initialized and pass in the props that were passed to them, along with the full name of the class (i.e. aws-s3.Bucket).

This will allow users to override this method at the stack level (e.g. their base MyCompanyStack class) and implement policy around default values. For example, they can enforce that all Buckets are created with encryption:

public getPropsForResource(className: string, props: any): any {
  if (className === 'aws-s3.Bucket') {
    if (props.encryption && props.encryption !== ENCRYPTED) { throw new Error('buckets must be encrypted'); }
  props.encryption = ENCRYPTED;
  }

  return props; // passthrough
}

Something along those lines... Just an option.

nirvana124 commented 4 years ago

I have tried to create one example of enforcing user to follow organization defaults and if you want you can create new resources or modify existing resource using Aspects. https://github.com/nirvana124/aws-cdk-library-example

Dzhuneyt commented 3 years ago

Isn't this already possible to be implemented using Aspects?

eladb commented 3 years ago

@Dzhuneyt Aspects can (potentially) mutate constructs after they have been initialized but there are many initialization properties that cannot be modified after the instance is created.

JordanDeBeer commented 3 years ago

One thing I've experimented with is in my attempt to create an "organization standard library" is trying to add my own logic to the core.Construct.onValidate() method. I couldn't get it working in a way that was intuitive in the short amount of time I spent on it, but I feel like exposing some sort of API for adding validation functions to this method could be useful, or at the very least insightful.

eladb commented 3 years ago

One thing I've experimented with is in my attempt to create an "organization standard library" is trying to add my own logic to the core.Construct.onValidate() method. I couldn't get it working in a way that was intuitive in the short amount of time I spent on it, but I feel like exposing some sort of API for adding validation functions to this method could be useful, or at the very least insightful.

Ha! We recently changed the api for validations and you should be able to just do this:

this.node.addValidation(...);

See: https://docs.aws.amazon.com/cdk/api/latest/docs/constructs.Node.html#addwbrvalidationvalidation

JordanDeBeer commented 3 years ago

would an upstream base validation aspect to support easily adding these validations to a given construct type be useful for this rfc?

I am thinking something like this (pseudo code-ish):

class ValidatorAspect implements core.IAspect {
  type: constructs.IConstructs;
  fn: constructs.IValidation
  constructor(type: constructs.IConstruct, fn: constructs.IValidation) {
    this.type = type
    this.fn = fn
  }

  public visit(node: constructs.IConstruct): void {
    const n = node as constructs.Node;
    if (node instanceof this.type) {
      n.addValidation({
        validate: this.fn
      });
    }
  }
}

it's more reactive compared to the getPropsForResource design, but i can see both being useful. I'd be happy to work on adding something like this if there's interest.

msmjack commented 3 years ago

@ericzbeard gave an awesome talk to my company a few months ago. He mentioned that companies keep trying to make a CDK L2.5 library. (That's what I wanted to do first too!) He mentioned that this was not recommended. This concept really seems to fix that issue.

In his blog post, he has a section about "Don't use constructs for compliance"

Another common pattern we have seen, particularly among enterprise customers, is creating a collection of construct libraries based on the L2 constructs included with the AWS CDK, with a 1-1 mapping of subclasses. For example, you might create a class called MyCompanyBucket that extends s3.Bucket and MyCompanyFunction that extends lambda.Function. Inside those subclasses, you implement your company’s security best practices like encrypting data at rest, or requiring the use of certain security policies. This pattern serves a need, but it comes with a significant drawback: as the ecosystem of AWS CDK constructs, such as AWS Solutions Constructs, grows and becomes more useful, your developer community will be effectively cut off from it if they can’t use code that instantiates the base L2s.

Investigate the usage of service control polices and permissions boundaries at the organization level to enforce your security guardrails. Use aspects or tools like CFN Guard to make assertions about the properties of infrastructure elements before deployment.

While, I totally agree that we should use tools like CFN Guard, there is still a need and it sounds like Aspects is not the full answer.

Aspects can (potentially) mutate constructs after they have been initialized but there are many initialization properties that cannot be modified after the instance is created.

I want to sell the idea that the current level of upvotes (8) does not reflect the actual need for this concept. From the blog post, this sounds like this might "serve a need" to a "common pattern" but doesn't have the drawbacks and headaches of a company specific wrapper.

AjkayAlan commented 3 years ago

I can vouch for @maria-jackson's thoughts above. Currently in my organization, we are struggling to find a good balance. On the one hand, we want to provide the ability for engineers to create infrastructure which follows industry best practice. On the other hand, we want engineers to build infrastructure easily which is safe, secure, and adheres to our organization requirements. For example, one of our organization policies is that no S3 bucket should be public by default. Presently we really have 3 ways to accomplish this:

  1. Provide an "L2.5" construct library, which directly goes against @ericzbeard's guidance, and for good reason.
  2. Run proactive scans against deployed infrastructure in our AWS account, or use SCP's.
  3. Create a Cfn-Guard ruleset. Require engineers to scan their CDK output using it in their pipeline

While cfn-guard would potentially solve this, the downside is that it would likely get you feedback in the pipeline (unless you run it locally). Instead, if we were able to bake these checks into CDK synthing (or just into the native constructs via a hook like onValidate), we would likely get quicker feedback to the engineer writing the code.

msmjack commented 3 years ago

None of these is integral to the solution, but they would be nice to have. Take the pseudo code as an illustration of an idea, not a preferred implementation. (I do not understand CDK or Typescript enough)

Easy to Read Configuration Code

I'd really like the reusable library to be really easy to read and look a lot like how you pass CDK parameters. I want it to be very clear what people are getting and very clear to our Security folks what the defaults are.

aws-s3.Bucket.defaults({Encryption: KMS_MANAGED});
aws-ec2.Volume.defaults({encrypted: True});
my-company.AwesomeL3.defaults({Foo: BAR});

L1, L2, and L3 Default Configurations

As the CDK community expands aws-solutions-constructs, cdk-patterns, and their own L3 libraries, I see a need to provide defaults to L3, not just L2s.

For example, I use aws-apigateway-dynamodb and pretend that it defaults to no encryption. While I could pass the value of the decryption every time to the L3 construct, I don't want to. If the solution just provides L2 defaults, those would be overwritten every time.

  1. Code Provided L3
  2. Company Provided L3 default
  3. Library Provided L3 default
  4. Code Provided L2 default
  5. Company Provided L2 default
  6. aws-cdk provided L2 default
  7. Code Provided L1 default
  8. Company Provided L1 default
  9. CloudFormation Defaults

Multiple Default Configurations

I'm also wondering how multiple defaults would work. If my company had C-L2, my BU had B-L2, and my team had T-L2. I'd have things in my BU that I'd want to come before my company, but I'd also like the latest updates from my company.

aws-s3.Bucket.defaults({
    Encryption: KMS_MANAGED
}).applyAdditionalDefaults(C-L2);

Easy Translation from cfn-guard

It would be cool if your cfn-guard rules could generate your CDK defaults.

Like, this could translated to something very specific AWS::EC2::Volume Encrypted == false

But this could not AWS::EC2::Volume Size == 101 |OR| AWS::EC2::Volume Size == 99

ericzbeard commented 3 years ago

Thanks for the input, @maria-jackson !

shawnsparks-work commented 3 years ago

@maria-jackson How does the defaults approach help when using libraries like aws-solutions-constructs? You would have to create each resource manually and pass them into the L3 constructs which certainly loses some of the benefit of those libraries.

msmjack commented 3 years ago

The solution below meets "Allow users to specify or install some code that controls the defaults of AWS resources," but not "or applies some policy or validation at the stack or app level." It more meets the Best practices for building company default constructs #3235, but as that is considered a duplicate of this issue I thought it was appropriate to add the discussion here.

Looking at https://github.com/awslabs/aws-solutions-constructs, here's the pattern I'd like the aws-cdk maintainers thoughts on as an alternative to the L2.5 library for default properties for "Defaults & configuration polic[ies]" or "compliant constructs."

  1. Create a library with default properties, ex DefaultLogGroupProps()
  2. Create version of overrideProps()
  3. Pass the defaults (or your overrides) to override the other properties.
_logGroupProps = overrideProps(DefaultLogGroupProps(), logGroupProps);

const logGroup = new logs.LogGroup(scope, _logGroupId, _logGroupProps);
alexpulver commented 3 years ago

I also like the AWS Solutions Constructs approach (thanks @maria-jackson for the great summary!).

The company could have its own libraries with the various defaults. The security team can own company_cdk_security, operations team (if the company has operations as a centralized function) can own company_cdk_operations, and so on. It would be great if CDK could generate these libraries from cfn-guard rules. Alternatively, the central teams can create simple collections, such as the example below (in Python). Then,cfn-guard rules will validate the eventual result as part of the deployment process.

I think that another important capability is to layer multiple defaults, and allow the application owners to reset some properties before creating the construct. For example, the defaults for public website might include enforce_ssl = True. For a specific use case, the application owner can get an exception to turn this off, while keeping the rest of the defaults.

Note: I added non-existing .update(other_props: object) and .from_props(props: object) methods to illustrate how the above might look like without using the hacky implementation that I experimented with :). The .from_props(props: object) method is only needed in Python, since the current implementation uses keyword arguments to pass properties.

Edit: If you want to experiment with that approach, I created a simple example: https://github.com/alexpulver/company-guardrails

company_cdk_security/aws_s3.py

from aws_cdk import aws_s3 as s3
from aws_cdk import core as cdk

class BucketPropsCollection:
    public_access = s3.BucketProps(enforce_ssl=True, public_read_access=True)

    __expiration_lifecycle_rule = s3.LifecycleRule(expiration=cdk.Duration.days(30))
    retention_policy = s3.BucketProps(lifecycle_rules=[__expiration_lifecycle_rule])

app.py

from aws_cdk import aws_s3 as s3
from aws_cdk import core as cdk

# import company_cdk_solutions
from company_cdk_security import aws_s3 as security_s3
# from company_cdk_operations import aws_s3 as operations_s3
# from company_cdk8s_security import deployment as security_deployment

app = cdk.App()

stack = cdk.Stack(app, "Website")

# I am using Python's dict.update([other]) semantics:
# https://docs.python.org/3/library/stdtypes.html#dict.update
bucket_props = security_s3.BucketPropsCollection.public_access
bucket_props.update(security_s3.BucketPropsCollection.retention_policy)
bucket_props.update(s3.BucketProps(versioned=True, enforce_ssl=False))
bucket = s3.Bucket.from_props(stack, "Bucket", bucket_props)

app.synth()
skinny85 commented 3 years ago

I really like this discussion, and I agree with the general direction where this is going!

I have a few comments on the details of each of the proposed solutions.


@maria-jackson I think, instead of having an overrideProps() function, you can take advantage of the fact that DefaultLogGroupProps() is also a function (BTW, I think calling it defaultLogGroupProps() is more idiomatic in TypeScript/JavaScript). So, you can have defaultLogGroupProps() take in an optional LogGroupProps object. In case LogGroupProps has any required properties, you can use the Partial TypeScript type, as in function defaultLogGroupProps(props: Partial<LogGroupProps> = {}) { ....

So, the code becomes:

const logGroup = new logs.LogGroup(this, 'LogGroup', defaultLogGroupProps({
  retention: logs.RetentionDays.TWO_MONTHS, // we can override any property of defaultLogGroupProps() this way
});

Which I think looks great.


@alexpulver I think you want to make these functions as well - making them class fields can have weird side-effects, as they are mutable.

You don't have the nice Partial feature from TypeScript, but I believe you can emulate it using keyword arguments:

class BucketPropsCollection:
  def public_access_props(**kwargs):
    return s3.BucketProps(enforce_ssl=True, public_read_access=True, **kwargs)

And then, in your CDK app:

from aws_cdk import aws_s3 as s3
from aws_cdk import core as cdk
# import company_cdk_solutions
from company_cdk_security import aws_s3 as security_s3

app = cdk.App()
stack = cdk.Stack(app, "Website")

bucket = s3.Bucket(stack, "Bucket", security_s3.BucketPropsCollection.public_access_props(
  versioned=True,
  enforce_ssl=False,
))

app.synth()

I'm sure we can get some nice types here too, but I'm not that familiar with Python to add them 🙂.

alexpulver commented 3 years ago

Thanks for the feedback! I changed to functions; it makes sense :).

In Python, when specifying the same parameter twice, interpreter bombs. For example, it can be enforce_ssl as part of **kwargs besides explicit use by BucketPropsCollection.public_access(). I couldn't think of an easy way for policy library vendors to work around that. Open to suggestions :).

As I mentioned above, I think we should support layered/chained construction of properties. For example, there could be methods to support compliance with FedRAMP (see example at the end from AWS Config conformance pack sample template), SOC, and other standards. If I need to comply with both FedRAMP and SOC, as a policy library consumer, I want to apply both, and trust the AWS CDK API to do the right/defined thing.

@skinny85 if I iterate a bit on the experience you suggested, it would be something like below - what do you think?

cdk.Props.merge is an imaginary new class and a static method :). It gets an arbitrary number of props objects. All objects should be of the same type, otherwise it is an error. The method reads props in the first and next instances, then creates a new instance with all values merged. If there is a same prop in both instances, the latter takes precedence. The same then happens for next instance and so on. I implemented an example of this approach in the cdk_props.py module. The reason I suggested cdk.Props namespace, is because I think that logic can apply to all props classes. That will also allow all props classes to apply whatever logic they have on the props arguments.

Another reason for having the merge method as part of AWS CDK API is that multiple vendors could produce policy libraries, and might not agree on the same API for merging prop values. I am thinking of something like Managed rules for AWS Web Application Firewall.

from aws_cdk import aws_s3 as s3
from aws_cdk import core as cdk

from company_cdk_security import aws_s3 as security_s3

app = cdk.App()

stack = cdk.Stack(app, "Website")

bucket_props = cdk.Props.merge(
    security_s3.BucketPropsCollection.public_access(),
    security_s3.BucketPropsCollection.retention_policy(),
    s3.BucketProps(versioned=True, enforce_ssl=False),
)
bucket = s3.Bucket.from_props(stack, "Bucket", bucket_props)

app.synth()

This way, policy library providers would just define collections of policies, and let the clients to combine them in the desired order.

FedRAMP compliance controls example: image

dontirun commented 3 years ago

@alexpulver I think a policy library is a great idea. One thing to consider is what happens if two libraries have conflicting guidance. There needs to be some mechanism to set priority and alert users that there is a potential conflict.

As I mentioned above, I think we should support layered/chained construction of properties. For example, there could be methods to support compliance with FedRAMP (see example at the end from AWS Config conformance pack sample template), SOC, and other standards. If I need to comply with both FedRAMP and SOC, as a policy library consumer, I want to apply both, and trust the AWS CDK API to do the right/defined thing.

While not a 100% solution, that's where my team is trying to go with cdk-nag. By using the Rule Packs based on the Config Conformance Packs, users can be alerted when their applications aren't following a specific compliance or set of compliances. Aspects could be used for auto remediation, but it might be better to let users decide what exactly they want to follow or ignore

skinny85 commented 3 years ago

One thing to consider is what happens if two libraries have conflicting guidance. There needs to be some mechanism to set priority and alert users that there is a potential conflict.

Yes, in my opinion, merge() should fail if there's a conflict between the provided props instances (meaning, two or more of them provide the same property with values that are not equal).

In the above example, imagine security_s3.BucketPropsCollection.public_access() had Versioned=True, but security_s3.BucketPropsCollection.retention_policy() had Versioned=False (yes, I understand it's unlikely in this specific case, but this is just an example 😜).

alexpulver commented 3 years ago

but it might be better to let users decide what exactly they want to follow or ignore

I agree with @dontirun on this one. Allowing the user to resolve conflicts instead of failing is important. There are always exceptions...

@skinny85 I see your point about "surprising" results if we just resolve conflicts by default. The default can be to fail unless the user explicitly asks for overrides. @skinny85 @dontirun what do you think about the following approach?

from company_cdk_security import aws_s3 as security_s3

bucket_props = cdk.Props.merge(
    props = [
        security_s3.BucketPropsCollection.public_access(),
        security_s3.BucketPropsCollection.fedramp_moderate(),
    ],
    overrides = s3.BucketProps(versioned=True, public_read_access=True),
)

Here I am asking to merge two policies - public access and FedRAMP Moderate. Let's assume that public access sets publicReadAccess=True and FedRAMP Moderate sets it to False (per the s3-bucket-public-read-prohibited AWS Config managed rule).

Let's say the website is a landing page that can have public access. Using the overrides parameter, I can explicitly specify what conflicts I want to have resolved and how. I can also specify additional props that are relevant for the website (e.g. versioned), that are not part of any policy. Any other conflict that doesn't have an override will cause merge() to fail, stating the open conflicts.

skinny85 commented 3 years ago

@alexpulver I like it!

dontirun commented 3 years ago

@alexpulver In addition to what you have, I would also suggest adding explicit overrides to the synthed CloudFormation metadata/CDK metadata warning. When my customers do security reviews, assessors find that information very valuable.

alexpulver commented 3 years ago

I experimented with combining cdk-nag and the above approach. Would be glad to have your opinion.

The full implementation of the snippets I use below is in the example app I created for this thread: https://github.com/alexpulver/company-guardrails

Merge function signature: cdk.Props.merge(props: Sequence[Any], *, overrides: Optional[Any]) -> Any:

app.py

import os

from aws_cdk import core as cdk
from cdk_nag import NIST80053Checks

from deployment import LandingPageFrontend

app = cdk.App()

LandingPageFrontend(
    app,
    "LandingPageFrontend",
    env=cdk.Environment(
        account=os.environ["CDK_DEFAULT_ACCOUNT"],
        region=os.environ["CDK_DEFAULT_REGION"],
    ),
)

cdk.Aspects.of(app).add(NIST80053Checks())

app.synth()

website/infrastructure.py

from company_cdk import appsec
from company_cdk import fedramp

bucket_props = cdk.Props.merge(
    [
        appsec.PropsCollection.aws_s3_bucket_public_access(),
        fedramp.PropsCollection.aws_s3_bucket(),
    ],
    overrides=s3.BucketProps(versioned=True, public_read_access=True),
)

The intent in this case is to have my app NIST 800-53 compliant, while still ultimately let users decide what rules they want to implement or suppress. I will use cdk-nag to run checks on my application, and make sure synth fails if there are non-compliant rules that were not implemented or suppressed. Here, running cdk synth will lead to the following error:

$ npx cdk synth
[Error at /LandingPageFrontend/Website/Bucket/Resource] NIST.800.53-S3BucketLoggingEnabled: The S3 Bucket does not have server access logs enabled - (Control IDs: AC-2(g), AU-2(a)(d), AU-3, AU-12(a)(c)).
Found errors

The error provides the exact construct path, so I know what resource to look at. I decide to implement the rule guidance. First, I add a logs bucket. Assuming I secured it properly, I can suppress the same check for the logs bucket, because it is the one that keeps the logs for the website bucket:

logs_bucket = s3.Bucket(self, "LogsBucket")
cfn_logs_bucket = cast(s3.CfnBucket, logs_bucket.node.default_child)
cfn_logs_bucket.add_metadata(
    "cdk_nag",
    {
        "rules_to_suppress": [
            {
                "id": "NIST.800.53-S3BucketLoggingEnabled",
                "reason": "This is the logging bucket itself",
            },
        ],
    },
)

Next, I decide to apply the company policy to implement NIST 800-53 compliance on the website bucket:

from company_cdk import appsec
from company_cdk import fedramp
from company_cdk import nist80053

bucket_props = cdk.Props.merge(
    [
        appsec.PropsCollection.aws_s3_bucket_public_access(),
        fedramp.PropsCollection.aws_s3_bucket(),
        nist80053.PropsCollection.aws_s3_bucket(
            server_access_logs_bucket=logs_bucket
        ),
    ],
    overrides=s3.BucketProps(versioned=True, public_read_access=True),
)

No errors this time! :)

$ npx cdk synth
dontirun commented 3 years ago

I like that process! Provides two sets of tools for developers on projects. A way to notify them about properties that aren't set for a standard and an easy way to remediate it.

As a side note you will likely get a cdk-nag error on that setup once the last pull-request for the NIST 800-53 Pack is finished 😊

OperationalFallacy commented 3 years ago

I like cdk-nag approach, should be pretty easy to automate enforcement in a pipeline.

msmjack commented 3 years ago

Another reason for having the merge method as part of AWS CDK API is that multiple vendors could produce policy libraries

Vendors like AWS?

from aws_cdk_policy import appsec
from aws_cdk_policy import fedramp
from aws_cdk_policy import nist80053
alexpulver commented 3 years ago

It could be AWS, but I also thought of vendors and solutions like Check Point CloudGuard (former Dome9 is part of it now)

msmjack commented 3 years ago

So, this solution

meets "Allow users to specify or install some code that controls the defaults of AWS resources," but not "or applies some policy or validation at the stack or app level."

I think there might be a few separate concepts to solve this issue:

In previous attempt (https://github.com/aws/aws-cdk-rfcs/pull/163) there was a concept of "Creation Hooks." I'd like to slightly adjust the meaning

This combines with the "PropsCollections" either passing them by the "creation hook" or when creating a new instance.

polothy commented 3 years ago

The concept of creation hooks sounds pretty interesting to me as a generic enforcement mechanism. I think my hope is that CDK application developers mostly don't even realize that they are building a compliant application. It just all happens as automatically as possible.

I think one thing that was attractive with tools like cfn-guard as that a security team could work on the policies outside of any one project and then we can enforce those policies in our pipelines. This way, individual applications do not need to upgrade/update their CDK applications just to ensure they have the latest policies. Maybe this there is a smart way to ensure an app has the latest policies?

polothy commented 3 years ago
from aws_cdk_policy import appsec
from aws_cdk_policy import fedramp
from aws_cdk_policy import nist80053

If these policies could also produce AWS Config deployments so you can verify that you are compliant with the actual running AWS Resources in your account. This would be really helpful for audits and getting security certifications.

msmjack commented 3 years ago

A Security professional could

They all do different things.

OperationalFallacy commented 3 years ago

I can tell what developers want:

Allow users to specify or install some code that controls the defaults of AWS resources or applies some policy or validation at the stack or app level.

Maybe it can be part of unit tests? Rather than merge props and overrides Give developers fast feedback, clean templates to use, and also provides evidence in a pipeline.

msmjack commented 3 years ago

@OperationalFallacy, I know I love fast feedback! and I'm not sure what you mean by "templated configs."

I think you can combine unit tests, merge props, and overrides to give that fast feedback, if a user knew if they were trying or just accidentally overwriting a ex. NIST control.

Off the cuff, you could combine Fine Grained Assertions with an imported Object Mother that stores the info for your base company standards. Then you could you could write your tests in such a way that use the object mother, but indicate if your new properties overwrite the object mother's or just add the property. That way no surprises.

cfn-guard definitely is better, but not quite as fast of feedback.

OperationalFallacy commented 3 years ago

By 'templated configs' I mean using higher-level constructs. I might be wrong, but if an org implements a strict compliance inside a construct - they probably wouldn't care about merging properties. It is a black and white choice from the security and architecture point of view, no?

Off the cuff, you could combine Fine Grained Assertions with an imported Object Mother that stores the info for your base company standards

Sure, the question is where effort should be: developing something that "mutates" cdk output so it is compliant. Or developing test library that validates resources are compliant. Developers still have a choice write cdk apps from scratch or use open-source constructs, they will just need to make sure compliance tests pass.

Imho, the second approach would win in enterprise.

msmjack commented 3 years ago

@OperationalFallacy, if a higher-level construct solves your use case, then yeah, it's way more straight forward. That concept is known as an L2+ library.

But there are situations where it can get frustrating. @benbridts had a good talk on CDK Day for this concept. And there are some listed in aws-cdk's best practices.

shawnsparks-work commented 3 years ago

It is a black and white choice from the security and architecture point of view, no?

No, it's not black and white unfortunately. There are policies which should apply to 99% of apps. When you have 100+ apps, there's going to be an app where the risk of not complying to a particular control is worth the reward and an exception gets put in place.

OperationalFallacy commented 3 years ago

@shawnsparks-work I agree, and probably more than 1%, those should be ok with "choose your own adventure", no? Maybe you can give an example or two?

GriffinMB commented 3 years ago

Ultimately, I think there are a couple of different components to this discussion, which might benefit from splitting it out into multiple topics.

There's a library-level component, like an Aspect-style enforcement or L2.5 constructs. There's also an organization-level component, which would allow an Org to enforce best practices uniformly. I'm working on a formal proposal for the second piece, but to give a high level summary:

This proposal attempts to address the Org component with a new L3 Construct and CLI command. It enables static analysis locally and as part of a CI/CD pipeline, which allow developers and organizations to enforce best-practices for their cloud infrastructure.

The CLI command (cdk check) synthesizes templates, and POSTs them to an API endpoint (configurable in cdk.json). The API service validates the template, and returns results. The API service is generic, and can be built with any tool (e.g. cfn-nag, cfn-guard, in-house scanner, etc) as long as the API conforms to the specification. Users can also configure CDK to run scans on synth via cdk.json.

The L3 construct is a conforming implementation of the API, and uses cloudformation-guard for analysis.

Benefits of an API over a Library for Orgs

alexpulver commented 3 years ago

Thanks @GriffinMB ! How would you suggest to handle exceptions from the enforcement?

alexpulver commented 3 years ago

Additionally, how can checks/enforcements from multiple domains/vendors be combined? For example, there are operational and security best practices (e.g. Well-Architected pillars) that can be checked for. Additionally, different ISVs might want to provide rule packs, and a project can use several of them.

GriffinMB commented 3 years ago

Exceptions can be handled in a lot of ways ha. We're working on a solution to that and will include it in the full proposal. There's a kind of natural way to do it by encoding exception handling in guard rules, but that's not the solution we're looking at right now since it isn't necessarily as flexible.

Different domains can be managed by rulesets. When the API gets a scan request, it will check the template against all rulesets the user has permissions for. Different Orgs, teams, etc would only have permissions to the rulesets applicable to them. This is something else that'll have more detail in the full proposal, but it would look something like:

import * as check from '@aws-cdk/check';

// Principal with permissions to call the API
const account = new iam.AccountPrincipal(...);

// An API service using API Gateway and Lambda Functions
const validator = new check.Validator(this, 'myValidator');

// Grant the principal permissions to validate any ruleset.
validator.grantValidateAllRulesets(account);

Then, when a principal associated with the account hits the API, their template would be checked against all available rulesets. If you want to only grant permissions to specific rulesets, it would look like this:

// Grant a principal permissions to use the validator API, and check
// their templates against "security-ruleset"
validator.grantValidatorRuleset('security-ruleset', principal)

On the user side of things, if they want to kick off a local scan and only care about a subset of rulesets they have access to, they can use the --ruleset flag (cdk check --ruleset security-ruleset).

GriffinMB commented 3 years ago

Also, the idea is that the deployment of this would ideally be in a separate account, where you have a single stack that creates and manages the Validator, Rulesets, and permissions for each.

Users can also add additional Validator endpoints if an org has multiple Validators with different Well Architected Rulesets.

alexpulver commented 3 years ago

When you say "Aspect-style enforcement" - do you mean checks and surfacing issues, like cdk-nag does, or mutating the application in place without developer control?

GriffinMB commented 3 years ago

I mean something along the lines of this: https://github.com/aws/aws-cdk-rfcs/issues/25#issuecomment-741892198

More generally, I just mean any solution where Construct validation is done on the Construct objects directly, during synthesis or compilation, rather than on build artifacts. I'm trying to differentiate between the (as I see it) two different problem spaces that can be addressed:

rix0rrr commented 3 years ago

Working that the props level might work, but I can't help but notice that what we're looking for is Dependency injection.

We might want to substitute an internal call to new kms.Key() with something like new mycompany.SecureKmsKey() (or whatever). For that we would need some IoC container-like mechanism, and never call new kms.Key() ever again. Instead, we would call something like:

const key = Injector.of(this).obtain(kms.Key);

We now have the ability to

This is strictly more powerful, and convenient APIs that leave the class the same but just override some props defaults could still be built on top of this.

alexpulver commented 3 years ago

I looked deeper into cfn-guard, and I think that having cfn-guard validate command add support for something like a CDKTemplate type would be sufficient (besides the existing CFNTemplate type added in 2.0.3 version). The CDKTemplate type can add the CDK path to the rules evaluation output, making it easier for developers to identify the relevant construct in the source code. I opened an enhancement issue for that (see a shorter summary of the issue below). Beyond that, cfn-guard rules should probably serve as a source of truth. Given the CDKTemplate (or similar capability) is available, I would retract my suggestion for a need in an AWS CDK-specific policy-as-code evaluation capability. It should also play nice with CDK Pipelines when used in validation steps.

For an example, look at the following CloudFormation logical ID: APILambdaFunctionServiceRole3284E859. The AWS CDK generates it for a resource of user management backend (an example CDK app) production environment that is deployed by a CI/CD pipeline. CDK path for that logical ID is UserManagementBackend-Pipeline/UserManagementBackend-Prod/Stateless/API/LambdaFunction/ServiceRole/Resource. Parts before /API/... refer to CI/CD pipeline stack, deployment stage and stateless stack, respectively. As an AWS CDK developer, I can quickly understand what source code I should look at to make changes, because I provided these construct IDs when instantiating the construct classes (see deployment.py snippet below).

I think that can help organizations working with both raw CloudFormation and AWS CDK. It can also be a basis for a service-based solution like @GriffinMB mentioned.

What do you think?

deployment.py

class UserManagementBackend(cdk.Stage):
    def __init__(
        self,
        scope: cdk.Construct,
        id_: str,
        *,
        database_dynamodb_billing_mode: dynamodb.BillingMode,
        api_lambda_reserved_concurrency: int,
        **kwargs: Any,
    ):
        super().__init__(scope, id_, **kwargs)

        stateful = cdk.Stack(self, "Stateful")
        database = Database(
            stateful, "Database", dynamodb_billing_mode=database_dynamodb_billing_mode
        )
        stateless = cdk.Stack(self, "Stateless")
        api = API(
            stateless,
            "API",
            database_dynamodb_table_name=database.dynamodb_table.table_name,
            lambda_reserved_concurrency=api_lambda_reserved_concurrency,
        )
mnoumanshahzad commented 2 years ago

@GriffinMB thanks for sharing valuable insights on how you are approaching the problem spaces. A lot of your comments are making sense to me.

What I am confused about is the enforcement part of things.

How I currently understand it:

How is the above workflow different than the CDK Aspects?

The CDK Aspects library can be developed internally at its own pace and the developers are not blocked by it.

mnoumanshahzad commented 2 years ago

Probably I am missing out on something or misunderstanding the intent of the discussion

On the matter of providing defaults, one can provide a custom >=L2 construct with defaults of choice and it is the matter of exposing the resources encapsulated within a construct so that the CDK escape hatches can be leveraged.

On the construct provided on an organisation level

export class MyDummyConstruct extends Construct {

  public readonly role: IRole;
  public readonly bucket: IBucket;

  constructor(scope: Construct, id:string, props: MyDummyProps) {
    super(scope, id);

    this.role =. new Role(this, "Role", {... can enter the sane defaults ...})
    this.bucket = new Bucket(this, "Bucket", {... can enter the sane defaults ...})
  }
}

On the consumption side, making use of the escape hatch


const myDummyResource = new MyDummyConstruct(this, "DummyConstruct", {... some props ...})

// The underlying resource is accessed and behavior other than the sane default is applied
myDummyResource.bucket.applyRemovalPolicy(RemovalPolicy.DESTROY)

Additionally, the props for a construct can also be used as a mechanism to override the sane defaults.

Would be helpful to clarify my thoughts if someone can point out the drawbacks and in blind spots in this approach.

alexpulver commented 2 years ago

My 2 cents on using escape hatches :). Each construct library has a public API. To me, using escape hatches is like using internal implementation details of a library. These are subject to change without notice, because they are not part of the API contract. And this can break my code unexpectedly, so I prefer to avoid this.

I'd rather prefer the library to add the required functionality (e.g. expose the underlying resources through an intentional public API). If the library generates many resources (like ApplicationLoadBalancedFargateService), it will take each consumer a significant effort to understand the underlying construct tree and find the relevant resource to patch. That breaks the intent of a higher-level construct library - encapsulate the details through abstraction to save library consumers time.

I think the organizational constructs should support at least the following properties to avoid being a blocker while putting guardrails in place:

There are additional topics being discussed in this thread, unless I missed something :):

dontirun commented 2 years ago

There are additional topics being discussed in this thread, unless I missed something :):

I wonder how much of this could be implemented through the current and possibly extended functionality of the cdk watch and the AWS Toolkit for VS Code for fast feedback.

If cdk watch added functionality to read cfn-guard policy, it could provide fast feedback with those organization specific rules, like it already does with aspects.

The AWS Toolkit for VS Code could read cfn-guard policy and autofill those defaults in IaC like cdk, cfn, Terraform, etc on instantiation of the specific resources.

awsmjs commented 11 months ago

Closing this ticket as it is not a current priority.