cli: allow disabling context lookups (was commit cdk.context.json)

philtay commented 4 years ago

The docs state that cdk.context.json should be checked-in to ensure reproducibility. However, that file may contain information that's usually considered sensitive like AWS account numbers (e.g. aws-samples). Such data shouldn't be commited and utilities like git-secrets actively try to prevent that. Could you please share what's the best practice in these cases?

mjsilva commented 4 years ago

Same questions as OP. It would be good to understand better the pros and cons of commiting cdk.context.json. For example on our environment we have 2 organisation accounts staging and production. On our CI we use different IAM credentials to deploy to each account, in this case is there a benefit of commiting it?

As far as I've understand you'd only commit the file if you want to be able to run synth without having to login to AWS and possible speed things up a bit since is saves some lookups.

Another thing I'd like to understand is the difference between cdk.json and cdk.context.json. I noticed if I put things like "@aws-cdk/core:enableStackNameDuplicates": "true", in cdk.json once I run diff they are moved to cdk.context.json, not sure why.

rix0rrr commented 4 years ago

As far as I've understand you'd only commit the file if you want to be able to run synth without having to login to AWS and possible speed things up a bit since is saves some lookups.

Not really, depending on what's in there. If there are 3 AZs in your region and you deploy a CDK app, a VPC will be created with a specific IP layout.

If you don't commit context.json and on the next deploy AWS happens to have added an AZ to your region, your VPC will now try to span 4 AZs, have a different IP space layout and all your subnets will have to be destroyed and recreated (this will probably fail so you won't actually lose any data, but your deployment will be stuck with no way to move forward).

Similarly, if you've started an instance off of an Amazon Linux AMI ID, and there now happens to be a new version of that AMI available... do you want your instance to be automatically replaced with a new version of the OS? What if you had state on that machine?

cdk.context.json is not just an API call cache, it's a store of nondeterministic decisions taken in the past, which you must commit in order to ensure consistency in the future. It also gives you the opportunity to update the values piecemeal. ("YES I will take a new AMI ID now, NO I will not be spreading to new AZs").

As for account IDs, we never considered them very sensitive, similar to how usernames aren't usually considered sensitive. AWS itself publishes many of its own account IDs for users to put in their IAM policies, for example. The keys to the castle are in the access credentials to the account, not the account ID itself.

In fact, we generally expect people to put account IDs in their source as well (or build their application to read them from a configuration file), especially in the upcoming CI/CD implementation where we're going to force it.

It may be possible that our opinions differ from other departments whose specialization is security. The biggest risk I personally see is bucket sniping.

I see a couple of solutions to this, but they all come with their own downsides:

Account IDs in context file:

Storing context in the account itself. We could store the context for an account INSIDE the account, let's say in SSMPS. There would be no more cdk.context.json, so nothing to commit (yay). Would not get rid of account ids in the source files though.
Make account IDs nonreversible. In the context.json file, we could hash the account IDs to make then nonreversible. Would not get rid of account ids in the source files though.

Account IDs in source:

Symbolic account IDs. Instead of specifying account IDs in source files, we'd have a symbolic identifier such as $prod1, $prod2, $prod3 etc. The next complication then becomes: where do we store the mapping from { "$prod1": "123456789012" }? It will then be the user's responsibility to make sure the mapping is transported and kept in sync between all developer's machines. Also, these values could not be stored in the context.json file directly, because there's no guarantee that $prod1 on one machine is the same account as $prod1 on another machine, so we'd need one of the other solutions in addition to this.

I think the end result of this will be that we may implement one of the solutions for the context, and it will be users responsibility to hide account IDs from their sources if desired, to be implemented in a way that suits them.

mjsilva commented 4 years ago

thanks for taking the time @rix0rrr. That provided a lot of clarity on things I was wondering about.

I'm still not completely clear though, I understand the use case I just think I'm missing the the development workflow.

Since we run cdk diff + cdk deploy as part of our CI/CD pipeline. When am I supposed to update/create cdk.context.json, should I just run cdk synthesise while I'm developing to make sure cdk.context.json has the right data in and then commit?

rix0rrr commented 4 years ago

When am I supposed to update/create cdk.context.json, should I just run cdk synthesise while I'm developing to make sure cdk.context.json has the right data in and then commit?

Yes.

Honestly it's probably also a good idea to add a flag for the CLI to fail unexpected context lookups (--no-lookups), which you can then use in CI/CD to make sure the output is deterministic (@shivlaks I just came up with another feature request :) )

eladb commented 4 years ago

@rix0rrr @shivlaks should we create a new issue for the CLI feature request and close this one out?

pgollucci commented 4 years ago

This one has a lot of good info in it. Definitely interested in this one.

eladb commented 4 years ago

@shivlaks ping

shivlaks commented 4 years ago

@eladb sorry, this one skipped my notifications inbox. That makes sense to me - saw that you already re-titled it. I dropped the guidance label as well. Let's use this issue to track the feature if there are no objections.

schourode commented 1 year ago

@eladb Unless I have misread the thread above entirely, I believe this issue can now be closed as fixed:

$ cdk --help
…
      --lookups            Perform context lookups (synthesis fails if this is
                           disabled and context lookups need to be performed)
                                                       [boolean] [default: true]
…

TheRealAmazonKendra commented 1 year ago

Closing this issue based off the previous comment.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws / aws-cdk

cli: allow disabling context lookups (was commit cdk.context.json) #6929

⚠️COMMENT VISIBILITY WARNING⚠️