(cli): reduce deploy role scope

tjenkinson commented 3 years ago

Currently the deploy role has a lot of access to *:

https://github.com/aws/aws-cdk/blob/b96efd8aaa143845b9fe315a9ee1e8398c4d83c2/packages/aws-cdk/lib/api/bootstrap/bootstrap-template.yaml#L314-L336

I wonder if we can scope this down a bit by qualifier, so one bootstrap instance can't access things from another? Maybe with tags or enforcing a naming scheme? E.g. all stacks must start with cdk-${qualifier}-

Use Case

We're investigating bootstrapping for each service with different qualifiers to try and reduce the blast radius if access to the deployment role (and others) from one bootstrap instance is compromised. The other roles/policies look like they're scoped pretty well. The FilePublishingRoleDefaultPolicy for example can only be used with the StagingBucket.Arn for that bootstrap instance.

I think we'll need to create a custom deploy role which can only invoke specific CloudFormation stacks with a restricted execution policy (unless this could be an official feature of the bootstrap stack in the future), but was looking to use this as a template, and if this level of access is really needed, I think the isolation we want really isn't going to be possible.

Thanks.

Proposed Solution

I'm not sure if we can actually scope the permissions down from *. The comment

# Permissions needed in CodePipeline.
# S3 and KMS are needed on '*',
# as we don't know the IDs of the CodePipeline's bucket and key in advance

makes it sound like we can't, but I'm really hoping that's not the case.

[x] :wave: I may be able to implement this feature request (depending what the solution ends up being)
[x] :warning: This feature might incur a breaking change

This is a :rocket: Feature Request

tjenkinson commented 3 years ago

# Permissions needed in CodePipeline.
# S3 and KMS are needed on '*',
# as we don't know the IDs of the CodePipeline's bucket and key in advance

I'm wondering actually why the deploy role itself needs access to s3? For CodePipeline don't/can't we create a separate role with the necessary permissions for the s3 bucket that has been configured for its artifacts?

skinny85 commented 3 years ago

I'm wondering actually why the deploy role itself needs access to s3? For CodePipeline don't/can't we create a separate role with the necessary permissions for the s3 bucket that has been configured for its artifacts?

Because it needs to read the actual template to deploy from the pipeline's artifact Bucket.

tjenkinson commented 3 years ago

Because it needs to read the actual template to deploy from the pipeline's artifact Bucket.

Ah right. Is this specifically when deploying from a cdk-pipeline?

skinny85 commented 3 years ago

Because it needs to read the actual template to deploy from the pipeline's artifact Bucket.

Ah right. Is this specifically when deploying from a cdk-pipeline?

Yep.

tjenkinson commented 3 years ago

Interesting. Is this so that the pipeline can run in one account and then the account the stack is being deployed into can read the template from the pipeline account?

Need to look through the code a bit more to get a better understanding how that works, but I wonder if the templates could be copied to a separate well known bucket (like the assets bucket) as part of the pipeline and retrieved from there?

skinny85 commented 3 years ago

This is just because CodePipeline works this way - an Action that executes in a different account must first assume a Role in the different account, and only then read the artifacts from the pipeline's artifacts Bucket (which must be in the pipeline's account).

Need to look through the code a bit more to get a better understanding how that works, but I wonder if the templates could be copied to a separate well known bucket (like the assets bucket) as part of the pipeline and retrieved from there?

In theory yes, but we wouldn't be able to use the existing CloudFormation CodePipeline Actions if we wanted to do it that way - we would have to create our own Actions in that case.

tjenkinson commented 3 years ago

Cool thanks for explaining. I think I'm on the same page now.

The problem only applies for cross-account deploys from a cdk-pipeline, because in order for the CodePipeline CloudFormation action in account A to do the deploy in account B, account B needs to read the stack from the account A CodePipeline artifact bucket. For a non cross-account deploy, no extra s3 permissions would be needed, because the CodePipeline would already have access to its own artifact bucket. At bootstrap time in account B, we don't know which CodePipeline (artifact buckets) in account A might want to do a deploy into account B. Therefore to solve this we've added the * to the bootstrap deploy role so that when account B assumes the deploy role in A, it can access all buckets, including any CodePipeline artifact bucket. This correct?

It would be great for the CDK to be a bit more 'secure by default'. Individual packages are great at creating scoped down roles and policies, and the other roles in the bootstrap stack look pretty scoped down. I wonder if there's some room for change here?

The thing that stands out for me is that we have the * in the bootstrap stack, which applies for everyone, when in reality it's only needed for the subset of users of the cdk-pipelines package, who are doing cross account deploys.

In theory yes, but we wouldn't be able to use the existing CloudFormation CodePipeline Actions if we wanted to do it that way - we would have to create our own Actions in that case.

I wonder if we could switch to a solution where we use the CodePipeline CloudFormation action for same account deploys, but for cross account switch to CodeBuild and either using the CDK to deploy or a CloudFormation API call? The template would be copied into the assets bucket. I think doing this would mean these * permissions would no longer be needed in the bootstrap template anymore, and also it would mean account A would not need to --trust account B anymore.

Thoughts?

rix0rrr commented 3 years ago

I think a more comfortable middle ground here would be to employ resource tags and IAM conditions on those resource tags.

For example, the S3 buckets could be created with a tag of aws-cdk:deploy-artifacts=true and the policy could have a Condition that allows S3 access only on buckets with that tag.

You can deploy your own customized version of the bootstrapping template to experiment with this: https://docs.aws.amazon.com/cdk/latest/guide/bootstrapping.html#bootstrapping-customizing

tjenkinson commented 3 years ago

Hi @rix0rrr that sounds like it would work well.

Will give it a go and I'm happy to open a PR with a change that makes the pipelines module add the tag and the template change.

Wondering if it would be possible to do this in a backwards compatible way though?

tjenkinson commented 3 years ago

@rix0rrr given the pipelines module is currently developer preview and the * rules in the bootstrap template are just for that (?), maybe right now a change like above could be made in a non-major version?

Not sure when pipelines might be coming out of preview but just want to make sure we don't miss an opportunity to change this without it getting more complicated and requiring opt-in etc.

Also I just noticed a bit of an issue. It looks like KMS supports resource tags but S3 only supports them on objects, not the bucket itself. This might still work if we tag all the objects that go into it though. Will look into that.

rix0rrr commented 3 years ago

S3 only supports them on objects, not the bucket itself

Oh well that is a bit of a complication :(

rix0rrr commented 3 years ago

Hmm my bucket sure seems to have a set of Tags...?

rix0rrr commented 3 years ago

Not sure when pipelines might be coming out of preview but just want to make sure we don't miss an opportunity to change this without it getting more complicated and requiring opt-in etc.

I have to think really hard about this :). I'm honestly not quite sure it can be done at all in a way that does not disturb existing users.

We can unconditionally add the tags to the buckets and keys, that seems a nondestructive operation.

But limiting permissions to resources that are tagged in that way is probably always going to require a flag.

tjenkinson commented 3 years ago

But limiting permissions to resources that are tagged in that way is probably always going to require a flag.

@rix0rrr I understand not wanting to break things for existing users, but IMO the security conerns of it not being the default is higher. Isn't breakage expected with something that's in preview? The rest of the default bootstrap stack looks well scoped, and as a user I think I'd expect the CDK to be secure with sensible defaults out of the box.

If it can't be the default, it's an option I'd always be opting in to.

Hmm my bucket sure seems to have a set of Tags...?

Huh. I went off this doc and stack overflow which seemed to imply it's only for cost tracking.

rix0rrr commented 3 years ago

A simpler solution might even be to check if a condition like aws:ViaAWSService or aws:CalledVia or something similar might be good enough as well.

The contention seems to be around the s3 and kms read permissions, right? If we could add a Condition so that those only apply if the role is being assumed via CodePipeline, would that be good enough?

That seems like a pragmatic solution that does not involve any upgrading.

tjenkinson commented 3 years ago

The contention seems to be around the s3 and kms read permissions, right?

Yep

If we could add a Condition so that those only apply if the role is being assumed via CodePipeline, would that be good enough?

It would be a lot better, but still unnecessarily open IMO, enough for us still to use a custom bootstrap stack and attempt to patch pipelines somehow (maybe adding tags) to work with cross account deploys.

There's no reason for a pipeline that runs something malicious in one account to be able to read all buckets and encryption keys from the other trusted accounts (except that pipelines needs it for cross account deploys right now).

rix0rrr commented 3 years ago

Alright, after digging a little deeper I'm convinced there's no way to write the IAM policy in a way that doesn't need additional preparation on the resources.

What I really wanted to encode is "you can only read S3 buckets in the account you're assuming this role from" (which is the actual rule we need!), but there doesn't seem to be a way to encode that.

So tagging seems to be the only way forward.

Let's worry about getting the capability in place first and testing it before we decide how and when we're going to enforce it on people.

rix0rrr commented 3 years ago

Oh wait maybe there still is? s3:ResourceAccount seems to exist, aws:SourceAccount seems to exist, ...

tjenkinson commented 3 years ago

What I really wanted to encode is "you can only read S3 buckets in the account you're assuming this role from" (which is the actual rule we need!)

If the pipeline is in account A and deploying to account B then account B will assume the role in A and need to access the bucket in A, which doesn’t sound like the same thing. Am I missing something?

The tag approach sounds pretty nice if it can work because it’s targeting specific buckets. Hoping to try and spike it out this Friday :)

rix0rrr commented 3 years ago

If the pipeline is in account A and deploying to account B then account B will assume the role in A and need to access the bucket in A, which doesn’t sound like the same thing. Am I missing something?

If the pipeline is in account A and it's deploying to account B then the pipeline in A will assume the role in account B and read artifacts from the bucket in A.

                                       │                                     
              Account A                │                Account B            
                                       │                                     
                                       │                                     
┌─────────────┐    ┌─────────────┐     │    ┌─────────────┐   ┌─────────────┐
│             │    │             │     │    │             │   │             │
│  Pipeline   │    │   Bucket    │     │    │    Role     │   │    Stack    │
│             │    │             │     │    │             │   │             │
└─────────────┘    └─────────────┘     │    └─────────────┘   └─────────────┘
       │                  │            │           │                 │       
       │                  │            │           │                 │       
       │                  │            │           │                 │       
       │                AssumeRole     │           │                 │       
       │──────────────────┼────────────┼──────────▶│                 │       
       │                  │            │           │                 │       
       │                  │            │           │                 │       
       │                  │     Read Artifacts     │                 │       
       │                  │◀───────────┼───────────┤                 │       
       │                  │            │           │                 │       
       │                  │            │           │     Deploy      │       
       │                  │            │           │────────────────▶│       
       │                  │            │           │                 │       
       │                  │            │           │                 │       
       │                  │            │           │                 │       
       │                  │            │           │                 │

tjenkinson commented 3 years ago

@rix0rrr thanks for the diagram

So I'm not sure how

What I really wanted to encode is "you can only read S3 buckets in the account you're assuming this role from"

helps here.

We'd be saying account B can read from any bucket in A, given the role was assumed from A.

This is the same as what we have now. B can already read any bucket in A?

We actually want B to only be able to read specific buckets in A, which that tags would do (if they're supported).

tjenkinson commented 3 years ago

If tagging the bucket isn't supported (which I'll check on Friday), and it's not easy/costly to find and tag all the artifact objects during the build, another approach could just be to to tag the encryption keys and restrict the access to the kms keys (which do support "Authorization based on tags"). This is assuming access to a bucket doesn't leak anything if you don't have the key.

This means we'd need to restrict the policy to allow reading from any bucket, but only for buckets/objects with encryption enabled.

I think we could do that with something like

"Condition": {
  "StringEquals": {
    "s3:x-amz-server-side-encryption": "aws:kms"
  }
}

on the s3 actions.

tjenkinson commented 3 years ago

Actually taking a step back. Why do we need a single deploy role? When the pipeline is created, can't a custom role be created at that point automatically which provides access to the specific bucket and keys?

The CloudFormation action is given the role to use, so it doesn't need the well known name.

So we'd still have the well known cdk-${Qualifier}-deploy-role-${AWS::AccountId}-${AWS::Region} role in the bootstrap stack, but this would not be allowed to be assumed from another account.

Then for each pipelines pipeline, a specific role would be created for the purpose of being assumed by 'account b' which can only access the specific pipeline bucket and keys (given we'd know what they were at pipeline construction time).

The same logic might also be applicable to the other roles. Trusted accounts would not need to be a thing if roles were created at pipeline construction time, specifically for the target accounts.

tjenkinson commented 3 years ago

Found https://github.com/aws/aws-cdk/pull/3208 (by @skinny85)

It looks like the CloudFormation action already supports this in a way that means you don't need to --trust account B. It looks like it does what I was describing above and create a scoped down role that B can assume. Tomorrow I'm going to try updating pipelines to not pass through the bootstrap template deploy role, and instead pass through account and have the action create the role.

tjenkinson commented 3 years ago

Ugh I see a catch with that now. The common template deploy role would need to be assumed in B (so --trust is needed) in order to be able to deploy the action deploy role (I guess during the UpdatePipelineAction).

But I think if it happened during the UpdatePipelineAction given it's during a standard cdk deploy, the template would go into the common assets bucket, meaning the common deploy role can be scoped down to just access from the assets bucket.

rix0rrr commented 3 years ago

This is the same as what we have now. B can already read any bucket in A?

Not so. B can only read buckets in A that have explicitly had a resource policy set that allows B to read (this is always true for cross-account resource accesses). There is only one such bucket, which is the CodePipeline artifact bucket.

However, B can read any bucket in B, which means that A (by assuming that role in B) will be able to read any bucket in B.

But at the same time, given that if A can deploy any CloudFormation template into B, it can also deploy any Lambda function that will scan every S3 bucket in that account and email their contents around the world. So while I agree that the * is not nice, I'm not sure there's a privilege escalation here.

But I think if it happened during the UpdatePipelineAction given it's during a standard cdk deploy, the template would go into the common assets bucket, meaning the common deploy role can be scoped down to just access from the assets bucket.

So now you are making it possible to deploy a support stack in B, based off of a bootstrap that allows cloudformation:* but not s3:GetObject, right? (or, only allows reading from the assets bucket in A, which is actually not strictly necessary if the template is <50kB and doesn't use assets, which the bootstrap stack doesn't)

It will work, but to my mind it has more moving pieces than necessary. I liked the tag-based approach better.

tjenkinson commented 3 years ago

Not so. B can only read buckets in A that have explicitly had a resource policy set that allows B to read (this is always true for cross-account resource accesses). There is only one such bucket, which is the CodePipeline artifact bucket.

Ah maybe I'm still not understanding it completely. I thought if B assumes the deploy role in A, it will be able to access all the buckets in A (given the *). Will read up more on this.

But at the same time, given that if A can deploy any CloudFormation template into B, it can also deploy any Lambda function that will scan every S3 bucket in that account and email their contents around the world. So while I agree that the * is not nice, I'm not sure there's a privilege escalation here.

By default yes, but I think the CloudFormation execution role that is used could be restrictive and prevent that. E.g. an approach I think we'll take is enforce a CloudFormation execution role which requires all roles that are created in the stack to have a specific permission boundary, which by default would allow nothing. The permission boundaries would be managed somewhere else. So then in your example a lambda could be created but it wouldn't have permission to read all the buckets (unless the permission boundary allowed it).

tjenkinson commented 3 years ago

Not so. B can only read buckets in A that have explicitly had a resource policy set that allows B to read (this is always true for cross-account resource accesses). There is only one such bucket, which is the CodePipeline artifact bucket.

I didn't notice previously that a resource policy granting B access was created on the bucket automatically, and I had it a bit backwards thinking that B assumed the role in A in order to read the bucket (and therefore B also had to --trust A). I get what you meant in https://github.com/aws/aws-cdk/issues/12985#issuecomment-785278138 now! Only allowing the deploy role to access buckets and keys that do not originate in the same account (and therefore must be in a resource policy in another account) sounds great. Will experiment with that first.

I assume https://github.com/aws/aws-cdk/pull/3208 was necessary to support the legacy bootstrap stack? If we only had the modern stack then the CodePipeline module could just use the well known deploy role the same way pipelines does now?

tjenkinson commented 3 years ago

So the good news is

          - PolicyDocument:
            Statement:
              - Action:
                  - 's3:GetObject*'
                  - ...
                Resource: '*'
                Condition:
                    StringNotEquals:
                        s3:ResourceAccount:
                            Ref: 'AWS::AccountId'
                Effect: Allow

works for the s3 bucket. When account A assumes the role in B it is able to access its own bucket that has the resource policy, but it can't access any buckets that are in B. Also B assuming the role in B doesn't have permission to access any bucket in B.

However I couldn't get something similar working for kms, and it says in the docs

AWS KMS supports all AWS global condition keys except for the following ones:

aws:SourceAccount

aws:SourceArn

and I think aws:SourceAccount is what we'd want 😢 . So we might need to use tags for the keys. That should be supported though unlike s3 where it might only be supported on the objects not the bucket (but using tags is a bit annoying because tags is more of a breaking change than just the new condition).

tjenkinson commented 3 years ago

        "Key": {
          "Type": "AWS::KMS::Key",
          "Properties": {
            "Tags": [
              {
                "Key" : "aws-cdk:deploy-artifacts",
                "Value" : "true"
              }
            ],
            ...
          }
        }

and

             - Action:
                  - 'kms:Decrypt'
                  - ...
                Resource: '*'
                Condition:
                    StringEquals:
                        aws:ResourceTag/aws-cdk:deploy-artifacts: 'true'
                Effect: Allow

works!

I'll start a PR

I think we should also put the cdk qualifier in the key name too

rix0rrr commented 3 years ago

I found another potential KMS condition we could use. How about this:

            Statement:
              - Action:
                  - kms:Decrypt
                  - kms:DescribeKey
                  - kms:Encrypt
                  - kms:ReEncrypt*
                  - kms:GenerateDataKey*
                Resource: *
                Condition:
                  StringEquals:
                    kms:ViaService:
                      - Fn::Sub: s3.${AWS::Region}.amazonaws.com

Combined with the S3 rule you figured out:

S3 access, only to buckets in different accounts (for this to succeed there must be a reverse grant on the bucket as well)
KMS access, only via S3 (which requires the S3 permissions implied by the previous statement)

Those together seem like they should suffice. Agreed?

tjenkinson commented 3 years ago

Neat!

Those together seem like they should suffice. Agreed?

After thinking about this further although this is much better, it's still not perfect as it has the following downside:

If account A assumes the deploy role in B, it can access all the artifact buckets (with same qualifier) in A.

I'm planning on trying out the other approach mentioned further up to not pass the deploy role to the CloudFormation action, but instead pass the account, and then have the UpdatePipeline stage deploy a separate role into account B that has access to the specific bucket in A (which is handled automatically in the CloudFormation action). With this approach the well known deploy role shouldn't need any of the * rules, and it also solves the issue I mentioned above. I also think it would be a bit easier to understand what's going on, and means the pipeline itself doesn't have a dependency on the well known deploy role anymore. This doesn't add any security by default, but if a custom CloudFormation execution role is used, it can do (because the custom role could not allow a role to be created with access to all deployment buckets in A).

tjenkinson commented 3 years ago

Started a PR here: https://github.com/aws/aws-cdk/pull/13422 (it's based off another so just check the recent commits)

Still WIP, but first impression is that it makes the pipelines module simpler, with the caveat that there are is a role for each cf prepare/deploy action, but these roles are nicely scoped. From some initial tests in an account it seems to work, although it wasn't very thorough yet.

rix0rrr commented 3 years ago

I appreciate that this works for you and I'm happy if it does, but I'm not inclined to merge this change.

Support stacks are a thorn in my side that are a necessary evil, but I'd like to keep to a minimum nonetheless. They would prevent two pipelines from targeting the same environment, for example.

If account A assumes the deploy role in B, it can access all the artifact buckets (with same qualifier) in A.

I don't think so. The artifact bucket in A does not have a policy on it that allows B to read from it.

tjenkinson commented 3 years ago

I don't think so. The artifact bucket in A does not have a policy on it that allows B to read from it.

Sorry I should have been more specific. What I meant was:

If account A assumes the deploy role in B, it can access all the artifact buckets (with same qualifier) that are for pipelines in A that are also configured to deploy to B.

E.g you might have Service X and Service Y which both have their pipelines in account A and deploy into account B.

The pipeline for Service X ideally shouldn't be able to read artifacts from the pipeline for service Y.

Support stacks are a thorn in my side that are a necessary evil, but I'd like to keep to a minimum nonetheless. They would prevent two pipelines from targeting the same environment, for example.

Not quite following what you mean by this? In the example above the Service X and Service Y pipelines would still be able to deploy into account B (and there would be 2 support stacks in B, one for each pipeline) if I'm understanding this correctly.

tjenkinson commented 3 years ago

Haven't got time to check this now, and can't find anything in the docs, but it occurred to me that given we are passing an execution role to CloudFormation, won't CloudFormation also use that role for reading the template from S3 (vs acting as the user)? My assumption up to now has been that the deploy role would be used, given that's what the action runs as. But is it the action that downloads the template and provides it to CloudFormation, or CloudFormation that downloads the template (using the execution role)?

If CloudFormation uses the execution role, it means it's working now due to the arn:${AWS::Partition}:iam::aws:policy/AdministratorAccess policy on the execution role, not the statements on the deploy role. That would mean the statements on the deploy role aren't needed, but if the user provides a custom execution role, they would need to handle this.

rix0rrr commented 3 years ago

I believe what's happening is CodePipeline is generating a presigned URL for the template and passing that URL to CloudFormation.

So the template is getting read using CodePipeline's permissions, not CloudFormation's.

It's worth trying out but I don't think it will work.

tjenkinson commented 3 years ago

I believe what's happening is CodePipeline is generating a presigned URL for the template and passing that URL to CloudFormation.

Cool thanks I'll have a play around and see if this is the case. If it is, then I'm also wondering if we're deploying a CloudFormation stack from A to B, why we can't have the action use a role in A, and then pass the execution role from account B to CloudFormation (giving account A the permission to PassRole the execution role in B to CloudFormation).

If it's a publicly accessible presigned url, then B shouldn't need to access A to read it?

rix0rrr commented 3 years ago

action use a role in A, and then pass the execution role from account B to CloudFormation

Hmm. Sounds feasible, but there might be a catch somewhere. I don't know the mechanics of CloudFormation and passed roles deeply enough to predict ahead of time...

tjenkinson commented 3 years ago

I believe what's happening is CodePipeline is generating a presigned URL for the template and passing that URL to CloudFormation.

Yep this seems to be the case. Opened a PR to get docs updated and hopefully confirmation of that.

action use a role in A, and then pass the execution role from account B to CloudFormation

This doesn't work either. Get

Cross-account pass role is not allowed.

so created another PR to document that and get confirmaton.

Spending so much time trying things that could have been avoided if the docs were clearer :|

Back to one of my original questions:

I wonder if we could switch to a solution where we use the CodePipeline CloudFormation action for same account deploys, but for cross account switch to CodeBuild and either using the CDK to deploy or a CloudFormation API call?

I'm wondering what the gain really is of using the built-in actions (even for same account deploys) over having a single CodeBuild step with cdk deploy, that does everything? That action would just need to run as a role that can assume the deploy and file publish roles that are in the manifest (which would be updated in the UpdatePipeline stage if necessary).

It would make the code simpler and remove the need for these * permission on the deploy role, given the template would be uploaded to the artifact bucket of account B (key point) if it needed to be.

Pros:

deploy role no longer has access to deploy keys and all s3 buckets in the account it's part of and other accounts that have the resource policies allowing it, because account B would no longer need to be able to access the pipeline artifact bucket in account A.
- we have discussed an approach to reduce this to only allow access to accounts that are not the account the role is contained in, and this should work, but is still a bit leaky and adds complexity to the bootstrap template that is not needed IMO, and also not needed for anyone not using the pipelines module
It's a single CodeBuild job vs multiple asset upload stages and CloudFormation stages, so will probably be quicker in general, and given it's a standard cdk deploy people will understand the log output and any errors should be clear.
Resulting pipeline stack should be smaller
If a cross account deploy fails due to a CloudFormation error, the CloudFormation errors should be visible in the cdk deploy output.
- The CodeBuild CloudFormation action links to the stack to see the events, which is sometimes hard/impossible to get to if it's a different account.
Code should be simpler because we don't need to split the cdk artifact into assets and templates anymore

Cons:

Don't get the nice CodePipeline CloudFormation UI and it's not split up visually into the seperate parts
- but does it really need to be? One of the gains is it makes it easier to see which part fails (but on the other hand it's actually harder to get the reason why it failed sometimes), but given all this is managed by the CDK, it shouldn't fail anyway. If it did, the cdk deploy output should already be clear.
Cost. Running CodeBuild may be more expensive than CloudFormation actions.

@rix0rrr thoughts? Have I missed some cons? Should this be an RFC?

rix0rrr commented 3 years ago

We explicitly decided against using cdk deploy in a CodeBuild job in the original Pipelines RFC. Major reasons: cost, no unnecessary running of code downlaoded from the internet with credentials to Do Things To Your Account.

tjenkinson commented 3 years ago

cdk deploy is already run in the UpdatePipeline CodeBuild job now though (using the synthesized cloud assembly) with credentials.

I don't understand why there couldn't be a new step that works the same way but deploys the non pipeline- stacks? It would be one of these steps for every stage.

I updated the last comment to add cost to the cons section. The cons still don't outweigh the pros IMO, and as soon as you have some assets you end up with a CodeBuild job for each asset anyway. Surely one CodeBuild job is cheaper than one for each asset? Maybe the duration of the job is really where the cost starts adding up?

tjenkinson commented 3 years ago

@rix0rrr I would really like to get to some sort of conclusion soon this issue is massive now and I'd like to get back to other things or preparing a PR to solve this 😅

Did I answer your concern above RE security? If it cdk deploy runs on the synthesised template it's like the current UpdatePipeline stage, so I don't think we're reducing security there?

If we still have different opinions on https://github.com/aws/aws-cdk/issues/12985#issuecomment-802793593 maybe it's possible to get input from more of the team? (This is why I mentioned an RFC further up).

The current position we're in is that we really want to use the CDK cross account with pipelines but can't (unless we use a custom bootstrap stack and don't use the pipelines module).

Also I'm trying to think of users of the CDK as whole, and I think most users will not customise the default bootstrap stack, and also not all users will use the pipelines module, so extra permissions for just the pipelines module in the bootstrap stack is increasing risk for no gain.

Adding some Condition's like we discussed further up will help, but makes the bootstrap stack more complicated IMO and still doesn't fully solve the problem. We'd still need to think about whether we use the default bootstrap stack and pipelines module with this (although it's definitely better than now).

The idea I proposed in https://github.com/aws/aws-cdk/issues/12985#issuecomment-802793593 IMO makes the bootstrap template simpler (removing the problematic permissions), the code simpler, and user experience better. I think we'd avoid issues like https://github.com/aws/aws-cdk/pull/13417#discussion_r598747393

rix0rrr commented 3 years ago

If we still have different opinions on #12985 (comment) maybe it's possible to get input from more of the team? (This is why I mentioned an RFC further up).

We had an RFC on this before. It's this one: https://github.com/aws/aws-cdk-rfcs/blob/master/text/0049-continuous-delivery.md where we explicitly made that decision already.

I think we'd avoid issues like #13417 (comment)

I'm not sure how we'd avoid that issue.

That issue is about the UpdatePipeline step, which does cdk deploy, not being able to know what roles need to be assumed. That is because they might have been changed from the defaults, but there's no way to know because there is no Cloud Assembly yet, so the StackSynthesizer hasn't had a chance to write the correct role(s) to it yet.

That issue is not about cross-account roles (or at least it wasn't from my side), because the Cloud Assemblies for the applications that need to be deployed cross-account have been written, are known, and the correct role ARNs can be read from them.

Whether or not we deploy CDK stages using cdk deploy or CloudFormation Actions makes no difference to either of those issues, so let's consider them separate.

Here's what I'm proposing we do:

Getting rid of the * permissions in the bootstrap template

BOOTSTRAP TEMPLATE

Remove the KMS *
Guard the S3 * with a condition like:

                Condition:
                    StringLike:
                        aws:ResourceTag/aws-cdk:read-from-account: { Fn::Sub: ',${AWS::AccountId},' }

PIPELINES

Generate the KMS key with a policy that allows all usage, as long as it's kms:ViaService: s3
Attach a tag to the artifact & replication buckets for every account that's allowed to read from it, looking like ,111111111111,222222222222,333333333333, (notice the bracing commas).

Allowing custom role names

Let's focus on properly reading the role names from the Cloud Assemblies that are being added via addApplicationStage() first, and disregard the potential custom role names for the Pipeline stack for a moment.

We will tackle that second issue separately.

tjenkinson commented 3 years ago

I think we'd avoid issues like #13417 (comment)

I'm not sure how we'd avoid that issue.

Ah yep that's a bit different.

We had an RFC on this before. It's this one: https://github.com/aws/aws-cdk-rfcs/blob/master/text/0049-continuous-delivery.md where we explicitly made that decision already.

Yeh I read this. That RFC doesn't describe the additional permissions that are needed for the action to access the artifact bucket. I was wondering if, knowing that now, it might result in a different outcome if there was a followup RFC.

I'd rather we removed the permissions entirely then make them more complicated with conditions and tags, beause even with the more complicated conditions and tags there's still this (allbeit much smaller) issue:

If account A assumes the deploy role in B, it can access all the artifact buckets (with same qualifier) that are for pipelines in A that are also configured to deploy to B.

E.g you might have Service X and Service Y which both have their pipelines in account A and deploy into account B.

The pipeline for Service X ideally shouldn't be able to read/write artifacts from the pipeline for service Y.

However, given it sounds like that's not the way you want to go, I'll start a PR with the changes you proposed on Friday 👍

I'd like to include the qualifier in the tag name/value to keep the isolation between the different CDK instances though, because that would also mitigate what I mentioned above if Service X and Service Y were using different qualifiers.

rix0rrr commented 3 years ago

I'd like to include the qualifier in the tag name/value to keep the isolation between the different CDK instances though, because that would also mitigate what I mentioned above if Service X and Service Y were using different qualifiers.

Sounds good.

Wenzil commented 3 years ago

@rix0rrr For what it's worth, our team independently went through the same thought process as @tjenkinson and reached the same conclusions. We were actually thinking of forking CDK Pipelines to implement the very same idea proposed here: a CodeBuild action running cdk deploy on the target accounts. It is conceptually so much simpler and would benefit from any feature added to the CDK toolchain in the future.

I understand that a decision has already been made following the original RFC, but with the insight we have now I would like to add my voice for reconsideration. I couldn't have done a better job of listing the pros and cons of the proposed alternative 👍

Wenzil commented 3 years ago

To pile on https://github.com/aws/aws-cdk/issues/12985#issuecomment-802793593, here are additional Pros for the proposed change:

Reap benefits from current and future features of CDK tooling
- For example, the CDK CLI heuristically detects security posture changes when cdk diff or cdk deploy is executed. Logging the security report would be a welcome improvement only possible with CDK tooling. Related to https://github.com/aws/aws-cdk-rfcs/issues/282
Less reliance on CodePipeline self-mutation
- Adding a new stack to your application deployment stage? Currently, CodePipeline needs to self-mutate to add the corresponding CloudFormation action. With the change, the pipeline definition does not need to mutate. Self-mutation becomes required only when the pipeline itself needs to be updated
- E.g. when adding pre-build steps to the synth stage
- E.g. when upgrading the CDK version used for (future) self-mutations
- E.g. when changing the target environment of deployments (to update the pipeline role permissions)
- Disabling self-mutation (already supported) becomes actually viable - this aligns with the common separation of responsibilities where an ops team manages CI/CD pipelines and developers only use existing CI/CD pipelines for their deployments
- Self-mutation is so hard to reason about in many cases. The less reliance on it, the better
No reliance on the CodePipeline cross-region support stack for cross-region deployments
- Deploying to a different region? Currently this requires a support stack and setting up CodePipeline for cross-region replication of artifacts. This is taken care of by CDK but adds unnecessary complexity. With the change, bootstrapping the target environment region is sufficient.

tjenkinson commented 3 years ago

No reliance on the CodePipeline cross-region support stack for cross-region deployments

The support stack isn’t needed with the pipelines module which is why the * permissions are required. (It doesn’t set the region prop. Just ensures the role in the destination account can read the artifact bucket of the pipeline).

Completely agree with the other points though.

mrpackethead commented 3 years ago

As a general comment, this is another case, where the higher order constructs and their opininated natures just don't work for everyone in every case. They are by their very nature opinionated and do a particular job in a particular way.

We all have to come to a point were we accept that we are not all going to be able to follow the same 'opinionated' construct to meet all our needs. If you attempt to try to build a construct that is all seeing, all knowing, all doing, it will (a) be very hard to get finished and (b) never meet all teh requirements.

There is no problem with building your own higher order constructs that sit on top of the lower constructs that cdk does so well. That is where CDK exhibits massive value..

At least for me, it is the higher order constructs that cause me the most pain because they don't' fit what i want to them to do. If i could rewind 6 months, i'd have not picked up cdk pipelines. I would have created my own 'pipeline' using codepipeline that does thigns the way that I need to do them. Thats not suggesting that the cdk pipeline is bad, its just saying that its not quite what i want to do. Reading this thread, and others, its not quite doing what others want to do either. It seems that the lower effort path, in many cases would be to create your own, or alternatively fork your own from this one. ( thats largely what i've done, though my changes are minor )..

IF your changes/fork works for you, then thats great. You dont' need to make them work for everyone else?

aws / aws-cdk