aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.38k stars 3.79k forks source link

aws-lambda: Defining a `deadLetterQueue` or `deadLetterTopic` will *always* add a policy to the function's execution role #29879

Open nathandines opened 3 months ago

nathandines commented 3 months ago

Describe the bug

As indicated in the title, if you define a deadLetterQueue or deadLetterTopic on a Function, it will always append an inline policy to the execution role which is associated with the function.

I would suggest that this is expected behaviour if the Function creates the IAM role, but it may be unexpected if the IAM role is defined elsewhere. It feels a lot like a side-effect. This feels like a problem within the same CloudFormation stack, but would probably be even more unexpected if the IAM role being referenced originated outside the stack.

Expected Behavior

I created an IAM role and SQS queue adjacent to a Lambda Function. I would have expected that if my IAM role were lacking a permission, it would:

Current Behavior

IAM policies for SQS permission will indiscriminately be added to IAM roles no matter where that role originates from

Reproduction Steps

Relevant CDK Source Code

https://github.com/aws/aws-cdk/blob/5347369fa11f4f11ab3893b9ac4c8467c5d514c3/packages/aws-cdk-lib/aws-lambda/lib/function.ts#L1596-L1599

https://github.com/aws/aws-cdk/blob/5347369fa11f4f11ab3893b9ac4c8467c5d514c3/packages/aws-cdk-lib/aws-lambda/lib/function-base.ts#L375-L381

Partial Code to Reproduce Behaviour

queue = aws_sqs.Queue(self, "queue")
role = aws_iam.Role(
    self,
    "role",
    managed_policies=[
        aws_iam.ManagedPolicy.from_aws_managed_policy_name(
            "service-role/AWSLambdaVPCAccessExecutionRole"
        ),
    ],
    assumed_by=aws_iam.ServicePrincipal("lambda.amazonaws.com"),
)
function = aws_lambda.Function(
    ...
    role=role,
    dead_letter_queue=queue,
    ...
)

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.114.1

Framework Version

2.114.1

Node.js Version

20.8.1

OS

macOS 14.2.1

Language

Python

Language Version

3.11.6

Other information

No response

khushail commented 2 months ago

Hi @nathandines , thanks for reaching out and sharing the repro code. I am able to repro the issue and can see the policy being added despite importing role. Marking the issue as appropriate.

sidharth15 commented 1 month ago
  1. Isn't this a necessary abstraction an L2 Construct should provide? In what way does providing the necessary permissions on the role create a problem with managing the role resource (even if it's in a separate stack)?
  2. If we accept this as an unexpected behaviour, should this extend to other permissions added to the role by the function Construct? For e.g.,
    1. https://github.com/aws/aws-cdk/blob/5347369fa11f4f11ab3893b9ac4c8467c5d514c3/packages/aws-cdk-lib/aws-lambda/lib/function.ts#L909-L917
    2. https://github.com/aws/aws-cdk/blob/5347369fa11f4f11ab3893b9ac4c8467c5d514c3/packages/aws-cdk-lib/aws-lambda/lib/function.ts#L929
    3. https://github.com/aws/aws-cdk/blob/5347369fa11f4f11ab3893b9ac4c8467c5d514c3/packages/aws-cdk-lib/aws-lambda/lib/function.ts#L1363
nathandines commented 1 month ago
  1. It certainly should be expected behaviour if you haven't defined a role, and are allowing the construct to manage it for you. The scenario which I'm talking about is when you've defined a self-managed role elsewhere, and then the construct manipulates it as a side-effect—I don't believe the construct should mutate resources which aren't under its purview.

    The problem is that a developer has presumably written the role out-of-band from the construct because they want/need some additional control over the management of it. To manipulate that role without the developer's explicit involvement is undermining that requirement.

    It could be for a variety of reasons: compliance, separation of concerns, specific permissions, existing deployment patterns, etc. It's not the tooling's job to judge whether a use-case is valid, but if the role is being managed out-of-band from the construct, the construct mustn't violate that intention.

  2. That's a valid consideration. I haven't explored the other permissions, but I would suggest that whatever behaviour is considered to be correct by the maintainers should be consistent across all permissions being applied.

sidharth15 commented 1 month ago

Yes, your explanation makes sense to me. Looking at the pattern followed in the Lambda L2 Construct, and possibly other L2 constructs as well, it makes me think it was a conscious design choice. It is going to be a major change as it's likely that a lot of projects depend on these implicit permissions that the Construct is providing.

Personally, I rely on the permissions that Lambda adds automatically if I add resources like a source SQS queue.