aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.55k stars 3.87k forks source link

(aws-secretsmanager): value from Fn import_value seems to break policy evaluation for secrets manager #24142

Closed npvisual closed 1 month ago

npvisual commented 1 year ago

Describe the bug

I am currently using the following to retrieve the ARN of a secret and key both created in a different stack (let's call it StackA) to be used in StackB's ECS deployment :

        # Get the key and secret ARNs, if available in the descriptor, if not try and
        # import from the Pod's Stack output.
        key_arn = str(pod.get("keyArn", Fn.import_value("PodKeyArn")))
        secret_arn = str(pod.get("secretArn", Fn.import_value("PodSecretArn")))

This gives me the option to either provide the ARNs as part of the cdk.json or to pull that information from StackA's output. The key and secret ARNs are then used to grant decrypt access and (secret) read access to the task principal :

        shared_database_key = kms.Key.from_key_arn(
            self,
            "shared-db-key",
            key_arn=key_arn,
        )
        shared_database_secret = secrets.Secret.from_secret_complete_arn(
            self,
            "shared-db-secret",
            secret_complete_arn=secret_arn,
        )

...

        task_principal = fargate_task_definition.task_role.grant_principal
        database_secret.encryption_key.grant_decrypt(task_principal)
        database_secret.grant_read(task_principal)

        # We do the same for the shared database secret and its key...
        shared_database_key.grant_decrypt(task_principal)
        shared_database_secret.grant_read(task_principal)

When the values are pulled from the cdk.json this works great and we currently deploy our self-mutating pipelines that set up our infrastructure using this process 👍.

Note : only the shared DB key and secret are retrieved from the cdk.json or StackA output ; StackB's DB key and secret are passed along as part of the same stack and we do not have any issues with those.

The following is an example of what is provided in the cdk.json (account number and last 4 digits of ARNs replaced by 1234567890 and abdc respectively) :

        "keyArn": "arn:aws:kms:us-east-1:1234567890:key/81b9c20b-6a68-45cb-aac0-20eece76abcd",
        "secretArn": "arn:aws:secretsmanager:us-east-1:1234567890:secret:podpipelinepod1stagepod1pod-95RlBejkckWb-8Sabcd",

The output of Stack A provides the following :

When I provide the values in the cdk.json everything deploys according to plan. No issue.

However when I do not provide values in the cdk.json (remove the lines) and rely on the output of StackA via Fn.import_value then we get the following error in the ECS task process (which uses botocore to retrieve the values in the secret) :

botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::1234567890:assumed-role/pod-1-silo-test-componentstestecstestecstaskdefTas-JMJAJJQTabcd/584419241b70425c906c9595016d67a8 is not authorized to perform: secretsmanager:GetSecretValue on resource: podpipelinepod1stagepod1pod-95RlBejkckWb-8Sabcd because no identity-based policy allows the secretsmanager:GetSecretValue action

So the secret seems to not be readable by the ECS task. However the policy is there and is identical to the policy generated when using the cdk.json values (performed a diff on the policy's JSON output and there's none).

Note : there doesn't seem to be any problems with the key. Just the secret.

When I reintroduce the key and secret ARN, the ECS task deploys successfully after the pipeline ran and the stack has finished updating.

Expected Behavior

I would expect the behavior would be the same whether the values for the secret ARN are passed via the cdk.json or retrieved via the Fn.import_value (Fn::ImportValue).

In addition the identity based policy that is referenced in the error message in the ECS task (access via botocore) doesn't make sense since the policy giving access to the resource is there for the given role (account number and last 4 digits or resource replaced) :

role ARN : arn:aws:iam::060465997059:role/pod-1-silo-test-componentstestecstestecstaskdefTas-JMJAJJQTabcd policy :

...
        {
            "Action": [
                "secretsmanager:GetSecretValue",
                "secretsmanager:DescribeSecret"
            ],
            "Resource": "arn:aws:secretsmanager:us-east-1:1234567890:secret:podpipelinepod1stagepod1pod-95RlBejkckWb-8Sabcd",
            "Effect": "Allow"
        },
...

Current Behavior

We get an error message when the application, through botocore, tries to retrieve the secret :

botocore.exceptions.ClientError: An error occurred (AccessDeniedException) when calling the GetSecretValue operation: User: arn:aws:sts::1234567890:assumed-role/pod-1-silo-test-componentstestecstestecstaskdefTas-JMJAJJQTabcd/584419241b70425c906c9595016d67a8 is not authorized to perform: secretsmanager:GetSecretValue on resource: podpipelinepod1stagepod1pod-95RlBejkckWb-8Sabcd because no identity-based policy allows the secretsmanager:GetSecretValue action

Note that when we change the role and use * for the resource, we get passed that access issue, vut then botocore cannot retrieve the actual secret :

ValueError: could not resolve ${secret:podpipelinepod1stagepod1pod-95RlBejkckWb-8Sabcd:host} in template

Reproduction Steps

Will try and extract a small snippet that can be used to reproduce. But want to put that out there in case this is a known issue already or it's easy enough to diagnose without a sample.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

CDK toolkit version: 2.64.0 (build fb67c77)

Framework Version

"aws-cdk-lib" = "2.63.2"

Node.js Version

whatever is installed as part of aws/codebuild/standard:6.0

OS

Ubuntu from aws/codebuild/standard:6.0

Language

Python

Language Version

3.10.5 (installed base aws/codebuild/standard:6.0)

Other information

No response

peterwoodworth commented 1 year ago

Thanks for reporting this @npvisual, we received a similar issue regarding the same thing earlier today, but I think based on your description these may be slightly different. It would be great if you could provide a reproduction stack!

npvisual commented 1 year ago

Thanks for reporting this @npvisual, we received a similar issue regarding the same thing earlier today, but I think based on your description these may be slightly different. It would be great if you could provide a reproduction stack!

@peterwoodworth thanks for the quick response. I read through the issue you mentioned above and this looks different. Will try and get you a stack to reproduce with.

npvisual commented 1 year ago

So I am not yet able to provide a "simple" app. I have tried to use boto3 in a Lambda, or a simple ECS deployment and an aws cli command inside the container, but I am not able to reproduce the issue. Need to come up with an image that can run a simple boto3 python app to make a call to secrets manager.

Anyhow, I think it's specific to how references from Fn.import_value are replaced in the CloudFormation template. After a good number of trials and errors, I have managed to land on using

        shared_database_secret = secrets.Secret.from_secret_name_v2(
            self,
            "shared-db-secret",
            secret_name=secret_name,
        )

rather than

        shared_database_secret = secrets.Secret.from_secret_complete_arn(
            self,
            "shared-db-secret",
            secret_complete_arn=secret_arn,
        )

So this is a good workaround and seems to change a few things in the template during the synth phase. For example, the pipeline template now shows something like this for the permission :

...
      {
       "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret"
       ],
       "Effect": "Allow",
       "Resource": {
        "Fn::Join": [
         "",
         [
          "arn:",
          {
           "Ref": "AWS::Partition"
          },
          ":secretsmanager:us-east-1:060465997059:secret:podpipelinepod1stagepod1pod-95RlBejkabcd-??????"
         ]
        ]
       }
      },
...

whereas before we had this :

...
      {
       "Action": [
        "secretsmanager:GetSecretValue",
        "secretsmanager:DescribeSecret"
       ],
       "Effect": "Allow",
       "Resource": {
        "Fn::ImportValue": "PodSecretArn"
       }
      },
...

Similarly, the environment variables, used by our code to request the secret via boto3, now appear as :

...
       {
        "Name": "TRIPPER_SHARED_SQL_HOST",
        "Value": {
         "Fn::Join": [
          "",
          [
           "${secret:",
           {
            "Fn::ImportValue": "PodSecretName"
           },
           ":host}"
          ]
         ]
        }
       },
...

instead of :

...
       {
        "Name": "TRIPPER_SHARED_SQL_HOST",
        "Value": {
         "Fn::Join": [
          "",
          [
           "${secret:",
           {
            "Fn::Select": [
             6,
             {
              "Fn::Split": [
               ":",
               {
                "Fn::ImportValue": "PodSecretArn"
               }
              ]
             }
            ]
           },
           ":host}"
          ]
         ]
        }
       },
...

In addition, looking at the ECS task definition, those environment variables (which use the secret's name inside our own pattern) now show :

...
                {
                    "name": "TRIPPER_SHARED_SQL_HOST",
                    "value": "${secret:podpipelinepod1stagepod1pod-95RlBejkabcd:host}"
                },
...

instead of the following (which shows the full ARN with the secret manager assigned suffix) :

...
                {
                    "name": "TRIPPER_SHARED_SQL_HOST",
                    "value": "${secret:podpipelinepod1stagepod1pod-95RlBejkabcd-8SkPye:host}"
                },
...

So, I'd say there's definitely something fishy going on when the reference to the secret is build from secrets.Secret.from_secret_complete_arn. Maybe it's something I am doing wrong as well. Unclear to me at this point.

peterwoodworth commented 1 year ago

Thanks for following up again. It's not uncommon when there are issues related to importing secrets and importing values, so I'm sure there's some use case you're running into that is buggy. Interestingly enough, in the issue I linked earlier, they also solved their problem by switching off of fromSecretNameV2

npvisual commented 1 year ago

Ok. Would be good to make imports from other Stack's output rock solid. This is really really helpful when you can rely on a different stack's output to fill in values that your stack is dependent upon instead of having to rely on manually inputting those in the cdk.json.

zhahaoyu commented 11 months ago

I ran into the same issue when usingfromSecretNameV2 with a JSON secret.

const secret = aws_secretsmanager.Secret.fromSecretNameV2(this, 'Secret', 'secret-name')

When the secret is passed to an ecs service.

new ecsPatterns.ApplicationLoadBalancedFargateService(
                this, 'Service', {
                         taskImageOptions: {
                                   secrets: { SECRET: aws_ecs.Secret.fromSecretsManager(secret, 'key')

It appears that the IAM policy generated will have the correct suffix, but in the task definition, it misses the suffix, and instead using the secret-name. This causes AccessDeniedException.

I solved it by manually overriding the taskDefinition.

It should also be solvable if I use several secrets instead of a composite json one.

The weird thing is that when I import a "secret" directly from an rds database construct, things seem to resolve just fine.

zhahaoyu commented 11 months ago

Specifically, the generated policy will have "arn:aws:secretsmanager:us-west-2:[account-number]:secret:[secret-name]-??????" but the container definition will have "ValueFrom": "arn:aws:secretsmanager:us-west-2:[account-number]:secret:[secret-name]:[key]::"

The correct container definition should be "ValueFrom": "arn:aws:secretsmanager:us-west-2:[account-number]:secret:[secret-name]-[6-digit-suffix]:[key]::"

pahud commented 1 month ago

Please note this for fromSecretNameV2:

https://github.com/aws/aws-cdk/blob/4987ef211b6e5daa37e0a392edec930c8a10ab9a/packages/aws-cdk-lib/aws-secretsmanager/lib/secret.ts#L542-L550

If you need ISecret that only requires the SecretName, you can use fromSecretNameV2(), otherwise you should always use fromSecretCompleteArn() to avoid potential AccessDeniedException.

For values from Fn.import_value() - CDK would not know the value in the synthesize time. For values from cdk.json - CDK would know the value in the synthesize time.

Now, what happens and what's the intention when you secret.grant_read(principal)?

The intention is to add a iam policy for the principal to read the value of the secret. To make it possible, we need to know the full secret ARN so we can add this as the resource into that identity-based policy.

Now,

  1. If the secret is ISecret from import value, CDK would not be able to render the principal policies with correct value because CDK won't know that in the synthesize time.
  2. If the secret is retrieved from cdk.json, CDK would be able to do render the policies with correct value.

That explains the difference.

Let me know if this issue is still relevant.

github-actions[bot] commented 1 month ago

This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.