aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.58k stars 3.88k forks source link

(custom-resources): AwsCustomResource ignores the configured role if a prior instance was already instantiated #13601

Closed royby-cyberark closed 2 years ago

royby-cyberark commented 3 years ago

This issue was repurposed following https://github.com/aws/aws-cdk/issues/13601#issuecomment-803574616.

Original report


When you enable logging in software.amazon.awscdk.services.elasticsearch, and you are creating log groups, then internally it instantiate the LogGroupResourcePolicy which extends the AWSCustomResource This in turn, sets the AWSCustomResource, which may overwrite any other AWSCustomResource role deployed in the same stack, or have its role overwritten. The reason it overwrites the role, is that the AWSCustomResource role is a singleton for the entire stack

This role will apply to all AwsCustomResource instances in the stack https://docs.aws.amazon.com/cdk/api/latest/docs/@aws-cdk_custom-resources.AwsCustomResource.html#role

Reproduction Steps

Python code, Irrelevant code removed for brevity. The main point is to turn on logging with log groups and basically get LogGroupResourcePolicy be to instantiated

# Create my custom resource with a singleton role which I use in the entire stack for all other custom resoruces
policy = AwsCustomResourcePolicy.from_sdk_calls(resources=[f'arn:aws:iot:{region}:{account_id}:rolealias/{role_alias}'])
# This is a singleton class that returns the one role used in my stack
lambda_role_singleton = CustomResourcesLambdaRole(scope)
lambda_role_singleton.add_to_policy(actions=["iam:PassRole"], resources=[role_arn])

AwsCustomResource(scope=self, id='CustomResource', policy=policy, log_retention=log_retention,
                                                on_create=on_create, on_update=on_update, on_delete=on_delete,
                                                resource_type='Custom::AWS-IoT-Role-Alias', role=lambda_role_singleton.role,
                                                timeout=timeout)

# Create the ES domain with logging turned on and log groups
aws_cdk.aws_elasticsearch.Domain(
            scope=self,
            id=id,
            version=ElasticsearchVersion.V7_9,
            logging=self._create_logging_options()
           ....)

def _create_logging_options(self) -> LoggingOptions:
        return LoggingOptions(app_log_enabled=True, slow_index_log_enabled=True, slow_search_log_enabled=True,
                              app_log_group=self._create_log_group(log_group_type=LogGroupType.APP),
                              slow_index_log_group=self._create_log_group(log_group_type=LogGroupType.SLOW_INDEX),
                              slow_search_log_group=self._create_log_group(log_group_type=LogGroupType.SLOW_SEARCH))

def _create_log_group(self, log_group_type: str) -> LogGroup:
        return LogGroup(scope=self, id=log_group_type.value.capitalize(), log_group_name=<log-group-name>,
                        removal_policy=RemovalPolicy.DESTROY)

What did you expect to happen?

Both my custom resource and ES domain are deployed

What actually happened?

Deployment failed since I was missing the "iam:PassRole" permissions required to deploy a role alias. Inspecting the cdk.out template, I saw that indeed, this wasn't present on the role policy. and this was only missing when the ES domain was deployed with logging turned on.

Environment

Other

Unless i'm missing something, a possible solution would allow me to pass the role that will be used or something similar with the logging options.


This is :bug: Bug Report

iliapolo commented 3 years ago

Hi @royby-cyberark. Thanks for reporting this. We are looking into it.

iliapolo commented 3 years ago

@royby-cyberark I wasn't able to reproduce the issue.

While it's true that all AwsCustomResource instances share the same role, this role accumulates policies as needed, allowing it to perform all the necessary actions.

I tried to reproduce it with the following code:

const role = new iam.Role(this, 'AwsCustomResourceRole', {
  assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com')
});

role.addToPolicy(new iam.PolicyStatement({
  actions: ['iam:PassRole' ],
  resources: ['*']
}))

new cr.AwsCustomResource(this, 'MyAwsCustomResource', {
  role,
  policy: cr.AwsCustomResourcePolicy.fromSdkCalls({ resources: ['*'] }),
  onCreate: {
    action: 'listBuckets',
    service: 's3',
    physicalResourceId: cr.PhysicalResourceId.of('BucketsList'),
  },
});

new es.Domain(this, 'Domain', {
  version: es.ElasticsearchVersion.V7_9,
  logging: {
    appLogEnabled: true,
  }
});

The resulting CloudFormation template will include:

Would be good if you can provide a complete reproduction snippet and/or the CloudFormation template.

royby-cyberark commented 3 years ago

@iliapolo Thanks for looking into this. i'm working to create a minimal working example and i'll update as soon as I have it.

royby-cyberark commented 3 years ago

@iliapolo I'm adding three code snippets each with its template, they have everything needed for this example - the first two are working fine and the third reproduces the issue. the code is in python, but it shouldn't matter i'm sure.

  1. create only a custom resource.
    
    from aws_cdk import core

import aws_cdk.aws_iam as iam import aws_cdk.custom_resources as cr

class CdkBugTestStack(core.Stack):

def __init__(self, scope: core.Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)

    role = iam.Role(scope=self, id='AwsCustomResourceRole', assumed_by=iam.ServicePrincipal('lambda.amazonaws.com'));
    role.add_to_policy(iam.PolicyStatement(actions=['iam:PassRole'], resources=['*']))

    my_custom_resource = cr.AwsCustomResource(
        scope=self, id='MyAwsCustomResource', 
        role=role,
        policy=cr.AwsCustomResourcePolicy.from_sdk_calls(resources=['*']),
        on_create=cr.AwsSdkCall(
            action='listBuckets',
            service='s3',
            physical_resource_id=cr.PhysicalResourceId.of('BucketsList'),)
    )
here is the cf template (from cdk synth): https://gist.github.com/royby-cyberark/4695033c2cdb981f010ceb9318844f9a

Note the following:

* resource AwsCustomResourceRole54BBCF34 is AWS::IAM::Role
* resource AwsCustomResourceRoleDefaultPolicy4EC1C81B (AWS::IAM::Policy) references it:
    "Roles": [{"Ref": "AwsCustomResourceRole54BBCF34"}] 

and has this statement:

        "PolicyDocument": {
          "Statement": [
            {
              "Action": "iam:PassRole",
              "Effect": "Allow",
              "Resource": "*"
            }
          ],
          "Version": "2012-10-17"
        },

* resource AWS679f53fac002430cb0da5b7982bd22872D164C4C ("AWS::Lambda::Function") - which is the custom resource lambda, 
has this role as its lambda role, see:

"Role": {
  "Fn::GetAtt": [
    "AwsCustomResourceRole54BBCF34",
    "Arn"
  ]
},

This means that the custom resrouce lambda has the role with the passRole permissions 

2. Now, running with this code that also create an ES domain with logging and log groups
NOTE:  the es domain is created AFTER the custom resource (at least in code...):  
Here is the cf template: 
https://gist.github.com/royby-cyberark/6cfc2e2b40930fd39029febc8d6a0a2d

Note the following:

* resource AwsCustomResourceRole54BBCF34 is AWS::IAM::Role

* resource AwsCustomResourceRoleDefaultPolicy4EC1C81B (AWS::IAM::Policy) references it:
    "Roles": [{"Ref": "AwsCustomResourceRole54BBCF34"}] 

and has this statement:

        "PolicyDocument": {
          "Statement": [
            {
              "Action": "iam:PassRole",
              "Effect": "Allow",
              "Resource": "*"
            }
          ],
          "Version": "2012-10-17"
        },

* resoruce AWS679f53fac002430cb0da5b7982bd22872D164C4C  (the custom resource Function)

       "Role": {
          "Fn::GetAtt": [
            "AwsCustomResourceRole54BBCF34",
            "Arn"
          ]
        },

This means that here also, the custom resource lambda has the passRole permissions via the AwsCustomResourceRole54BBCF34 role

3. And now (this is the issue), we create the domain BEFORE the custom resource:

When using this code:

from aws_cdk import core

import aws_cdk.aws_iam as iam import aws_cdk.custom_resources as cr import aws_cdk.aws_elasticsearch as es from aws_cdk.aws_logs import LogGroup

class CdkBugTestStack(core.Stack):

def __init__(self, scope: core.Construct, construct_id: str, **kwargs) -> None:
    super().__init__(scope, construct_id, **kwargs)

    domain = es.Domain(scope=self, id='Domain', 
            version=es.ElasticsearchVersion.V7_9, 
            logging=es.LoggingOptions(app_log_enabled=True, slow_index_log_enabled=True, slow_search_log_enabled=True,
                                      app_log_group=LogGroup(scope=self, id="app-log-group", 
                                                 log_group_name=f'/aws/aes/domains/esdomain/app-log-group',
                                                 removal_policy=core.RemovalPolicy.DESTROY),
                                      slow_index_log_group=LogGroup(scope=self, id="slow-index-log-group", 
                                                 log_group_name=f'/aws/aes/domains/esdomain/slow-index-log-group',
                                                 removal_policy=core.RemovalPolicy.DESTROY),
                                      slow_search_log_group=LogGroup(scope=self, id="slow-search-log-group", 
                                                 log_group_name=f'/aws/aes/domains/esdomain/slow-search-log-group',
                                                 removal_policy=core.RemovalPolicy.DESTROY)),
             removal_policy=core.RemovalPolicy.DESTROY,
    )

    role = iam.Role(scope=self, id='AwsCustomResourceRole', assumed_by=iam.ServicePrincipal('lambda.amazonaws.com'));
    role.add_to_policy(iam.PolicyStatement(actions=['iam:PassRole'], resources=['*']))

    my_custom_resource = cr.AwsCustomResource(
        scope=self, id='MyAwsCustomResource', 
        role=role,
        policy=cr.AwsCustomResourcePolicy.from_sdk_calls(resources=['*']),
        on_create=cr.AwsSdkCall(
            action='listBuckets',
            service='s3',
            physical_resource_id=cr.PhysicalResourceId.of('BucketsList'),)
    )

This is the cf template: 
https://gist.github.com/royby-cyberark/79afe4c2ac9bb070bd243f2ec524e8f6

* "AwsCustomResourceRole54BBCF34": "Type": "AWS::IAM::Role"

* AwsCustomResourceRoleDefaultPolicy4EC1C81B": "Type": "AWS::IAM::Policy"
"Roles": [
          {
            "Ref": "AwsCustomResourceRole54BBCF34"
          }
        ]

* AWS679f53fac002430cb0da5b7982bd22872D164C4C": "Type": "AWS::Lambda::Function"

        "Role": {
          "Fn::GetAtt": [
            "AWS679f53fac002430cb0da5b7982bd2287ServiceRoleC1EA0FF2",
            "Arn"
          ]
        },

Note the different role for the lambda

* the AWS679f53fac002430cb0da5b7982bd2287ServiceRoleC1EA0FF2 role is referened in the following places:
1. policy DomainESLogGroupPolicyc85bc784c087e05e4c70a6d74261a3050aa449590cCustomResourcePolicy6C96DA14 (for logs permissions)

2. the role resource: 
AWS679f53fac002430cb0da5b7982bd2287ServiceRoleC1EA0FF2": { "Type": "AWS::IAM::Role"

3. in the custom resource lambda - under role
4. in the custom resource lambda - under depends on
5. in policy MyAwsCustomResourceCustomResourcePolicy786A7FDE": "Type": "AWS::IAM::Policy" - list bucket permissions

This means that in this case the lambda does no have the passRole permissions. 
and it makes me this that a workaround would be instantiating the domain after the custom resource

or from another perspective, the custom resource lambda "AWS679f53fac002430cb0da5b7982bd2287ServiceRoleC1EA0FF2", had this role:
    "Role": {
      "Fn::GetAtt": [
        "AWS679f53fac002430cb0da5b7982bd2287ServiceRoleC1EA0FF2",
        "Arn"
      ]
    },

and the policy that has the passRole permissions (AwsCustomResourceRoleDefaultPolicy4EC1C81B) has this role: AwsCustomResourceRole54BBCF34

Thanks for your help.
iliapolo commented 3 years ago

@royby-cyberark Thanks for being so detailed!

Yes you're right, I neglected to consider the order of creation. Basically what happens is that the role of the custom resource is created when the domain is instantiated.

After this, any role passed to AwsCustomResource is effectively ignored since it reuses the already existing lambda function (which has a different role).

To workaround this, apart from re-arranging the order of instantiation, you can also avoid creating a dedicated role, and simply add policies to the existing one.

my_custom_resource = cr.AwsCustomResource(
            scope=self, id='MyAwsCustomResource', 
            # role=role, # dont pass a role
            policy=cr.AwsCustomResourcePolicy.from_sdk_calls(resources=['*']),
            on_create=cr.AwsSdkCall(
                action='listBuckets',
                service='s3',
                physical_resource_id=cr.PhysicalResourceId.of('BucketsList'),)
        )

my_custom_resource.grant_principal.add_to_policy(...)

I do agree this is confusing though. But it's more related to how AwsCustomResource behaves, rather than its usage by the elasticsearch domain, which is actually as intended.

Im going to route this to the custom-resources package for further discussion.

FYI @rix0rrr do you think there is something we can do here? Intuitively I would expect that any property that can be passed by the user, should be encoded in the UUID of the function, otherwise they are all similarly ignored.

rix0rrr commented 3 years ago

It should probably become a synth-time error to try and configure different roles in this way

txsutton commented 3 years ago

Hello, just wondered whether this has been resolved. I have a similar/same issue with awscustomresource where: Lambda 1 - Needs to AssumeRole to createVPCAssociationAuthorization on a different account Lambda 2 - Depends on Lambda 1 and needs to run associateVPCWithHostedZone with the additional action ec2:DescribeVpcs.

Lambda2 ignores the role= command and the AwsCustomResourcePolicy command if I try and run them consecutively (if I comment out Lambda2, deploy, uncomment Lambda2 and then update - everything works as expected)

royby-cyberark commented 3 years ago

@txsutton I wasn't able to make progress with it. it worked for me so far by allowing '*' which was fine for this use case, so I didn't get to invest in this further.

txsutton commented 3 years ago

I ran out of time and just replaced the second awscustomresource with a normal custom resource and Lambda which is a bit messy (in theory I could have just used 1 Lambda for both API calls but I wanted to keep the first one as a reminder on the syntax for next time)

The code I tried was this (first "custom resource")

            record_name = f"{service}.{core.Aws.REGION}.amazonaws.com"
            vpc_association_authorization = AwsCustomResource(self, f"VpcAssociationAuthorization-{service}",
                on_create={
                    "assumed_role_arn": "arn:aws:iam::xxxxxxxxxxxx:role/ProServeApgCentralisedVpcEndpoints-R53Role778AB903-DK5I6NEW48MZ",
                    "service": "Route53",
                    "action": "createVPCAssociationAuthorization",
                    "parameters": {
                        "HostedZoneId": "Z044055532752xxxxxxxx",
                        "VPC": {
                            "VPCId": vpc_id,
                            "VPCRegion": core.Aws.REGION
                        }
                    },
                    "physical_resource_id": PhysicalResourceId.of(f"createVPCAssociationAuthorization-{service}") #Used as Role SessionName so must be less that <64 chars
                },                           
                # Will ignore any resource and use the assumedRoleArn as resource and 'sts:AssumeRole' for service:action
#                policy=AwsCustomResourcePolicy.from_sdk_calls(resources=AwsCustomResourcePolicy.ANY_RESOURCE)
                role=R53_Spoke_role,
                policy = AwsCustomResourcePolicy.from_statements(
                    statements=[
                        iam.PolicyStatement(
                            actions=["route53:*","ec2:*","sts:AssumeRole"], 
                            resources=["*"])
                            ]
                        )

Second Custom Resource

            vpc_AssociateVPCWithHostedZone = AwsCustomResource(self, f"AssociateVPCWithHostedZone-{service}",
                on_create={
                    "service": "Route53",
                    "action": "associateVPCWithHostedZone",
                    "parameters": {
                        "HostedZoneId": "Z044055532752xxxxxxxx",
                        "VPC": {
                            "VPCId": vpc_id,
                            "VPCRegion": core.Aws.REGION
                        }
                    },
                    "physical_resource_id": PhysicalResourceId.of(f"associateVPCWithHostedZone-{service}") #Used as Role SessionName so must be less that <64 chars
                },
                role=R53_Spoke_role,
                policy = AwsCustomResourcePolicy.from_statements(
                    statements=[
                        iam.PolicyStatement(
                            actions=["route53:*","ec2:*"], 
                            resources=["*"])
                            ]
                        )
            )
            vpc_AssociateVPCWithHostedZone.node.add_dependency(vpc_association_authorization)
            vpc_AssociateVPCWithHostedZone.node.add_dependency(R53_Spoke_role)
github-actions[bot] commented 2 years ago

This issue has not received any attention in 1 year. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.

svyotov commented 1 year ago

This is still a bug, in the latest cdk 2.60.0 (build 2d40d77)

xgt001 commented 1 year ago

Still encountering this, my only reliable workaround is to use only 1 CR per stack and have that CR do all the AWS calls in one policy instead of many. This is definitely however not reliable.

In addition the workaround in the comment here helped, but I can't reproduce it consistently.

I am a bit wobbly on sleep, will update the details with my workarounds

tj-mm commented 11 months ago

An (admittedly rather dirty) workaround might also be this:

import { Aspects, IAspect, IConstruct } from '@aws-cdk/core';
import * as iam from '@aws-cdk/aws-iam';

class LambdaInvokePermissionAspect implements IAspect {
  public visit(node: IConstruct): void {
    // Check if the node is an IAM Role
    if (node instanceof iam.Role) {
      // Create the policy statement that allows lambda invocation
      const invokePermission = new iam.PolicyStatement({
        actions: ['lambda:InvokeFunction'],
        resources: ['*'], // Specify actual resource ARNs if possible
      });

      // Add the policy statement to the role
      node.addToPolicy(invokePermission);
    }
  }
}

const app = new cdk.App();
const stack = new cdk.Stack(app, 'MyStack');

// Apply the aspect to the stack
Aspects.of(stack).add(new LambdaInvokePermissionAspect());

This could be adapted in a way where the MyStack has a list of resources as a property and the lambda:InvokeFunction is limited to those ARNs.