aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.58k stars 3.88k forks source link

`@aws-cdk/aws-pipes-sources-alpha`: adding `deadLetterTarget` results in a failed update #31664

Open garysassano opened 3 days ago

garysassano commented 3 days ago

Describe the bug

I successfully deployed the following Stack:

const ddbTable = new TableV2(this, "DDBTable", {
  tableName: "ddb-table",
  partitionKey: {
    name: "Location",
    type: AttributeType.STRING,
  },
  dynamoStream: StreamViewType.NEW_AND_OLD_IMAGES,
  removalPolicy: RemovalPolicy.DESTROY,
});

const queue = new Queue(this, "Queue", {
  retentionPeriod: Duration.days(14),
});

const dlq = new Queue(this, "DLQ", {
  retentionPeriod: Duration.days(14),
});

const pipeSource = new DynamoDBSource(ddbTable, {
  startingPosition: DynamoDBStartingPosition.LATEST,
  batchSize: 1,
  maximumRetryAttempts: 0,
  // deadLetterTarget: dlq,
});

new Pipe(this, "Pipe", {
  source: pipeSource,
  target: new SqsTarget(queue),
});

Regression Issue

Last Known Working CDK Version

No response

Expected Behavior

Expected deadLetterTarget property to work correctly.

Current Behavior

After uncommenting the deadLetterTarget property and deploying again, I get the following error:

9:12:13 AM | UPDATE_FAILED        | AWS::Pipes::Pipe            | PipeBFF4E827
Resource handler returned message: "Resource of type 'AWS::Pipes::Pipe' with identifier 'PipeBFF4E827-kvcogRY98bn1' did not stabilize. Status Reason is Input parameter is invalid from the request due to : Error occurred while sending mes
sage to SQS queue: arn:aws:sqs:eu-central-1:<redacted>:dlq" (RequestToken: 07f6ec0d-8398-4a70-f9db-4c2870efa33d, HandlerErrorCode: NotStabilized)

❌  cdk-aws-webhook-momento-dev failed: The stack named cdk-aws-webhook-momento-dev failed to deploy: UPDATE_ROLLBACK_COMPLETE: Resource handler returned message: "Resource of type 'AWS::Pipes::Pipe' with identifier 'PipeBFF4E827-kvcogRY98bn1' did not stabilize. Status Reason is Input parameter is invalid from the request due to : Error occurred while sending message to SQS queue: arn:aws:sqs:eu-central-1:<redacted>:dlq" (RequestToken: 07f6ec0d-8398-4a70-f9db-4c2870efa33d, HandlerErrorCode: NotStabilized)
👾 Task "deploy" failed when executing "cdk deploy" (cwd: /home/user/github/cdk-aws-webhook-momento)

Reproduction Steps

See above.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.161.1

Framework Version

No response

Node.js Version

20.18.0

OS

Ubuntu 24.04

Language

TypeScript

Language Version

No response

Other information

No response

msambol commented 3 days ago

I'll take a look here.

msambol commented 3 days ago

I'd like input from the CDK core team on this. The DLQ was added after the pipe was created. Is there precedence in the CDK for updating the IAM role in situations like this?

I noticed when you add a DLQ to a pipe in the Console, it instructs the user to update the IAM role manually:

Screenshot 2024-10-05 at 12 41 05 PM Screenshot 2024-10-05 at 12 40 29 PM
garysassano commented 2 days ago

EventBridge Pipes IAM Role Behavior

Automatic IAM Role Creation

EventBridge Pipes includes an undocumented feature that automatically creates a default IAM role with the necessary permissions based on the services used in your Pipe. This happens only when no explicit IAM role is provided during the initial Pipe creation.

Inconsistent Role Management

There's a significant inconsistency in how EventBridge Pipes handles IAM role management:

  1. Initial Creation: Creates a default IAM role with appropriate permissions for all configured components
  2. Post-Creation: Once created, EventBridge Pipes stops managing the role entirely
    • No automatic updates to permissions when new components are added
    • No modifications when existing components are changed
    • No adjustments when DLQs are added

This behavior creates a confusing user experience because:

Design Flaw

This appears to be a design flaw in the EventBridge Pipes service. If a service takes the initiative to create and configure an IAM role automatically, it should logically continue to manage that role throughout the resource's lifecycle. The current behavior of:

  1. Creating the role automatically
  2. Then completely abandoning its management Creates an inconsistent and potentially problematic user experience.

AWS Deployment Comparison: CDK vs Console

CDK Deployment

The CDK deployment creates an IAM role named <stack-name>-PipeRole7D4AFC73-pM14iRXveKlj with a single policy called PipeRoleDefaultPolicy56E6A74D. The policy contains the following statements:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": "dynamodb:ListStreams",
            "Resource": "arn:aws:dynamodb:eu-central-1:<redacted>:table/ddb-table/stream/2024-10-06T02:57:08.158",
            "Effect": "Allow"
        },
        {
            "Action": [
                "dynamodb:DescribeStream",
                "dynamodb:GetRecords",
                "dynamodb:GetShardIterator"
            ],
            "Resource": "arn:aws:dynamodb:eu-central-1:<redacted>:table/ddb-table/stream/2024-10-06T02:57:08.158",
            "Effect": "Allow"
        },
        {
            "Action": [
                "sqs:SendMessage",
                "sqs:GetQueueAttributes",
                "sqs:GetQueueUrl"
            ],
            "Resource": "arn:aws:sqs:eu-central-1:<redacted>:cdk-aws-webhook-momento-dev-Queue4A7E3555-GCeOvP7pyILc",
            "Effect": "Allow"
        }
    ]
}

Notable characteristic: Creates a separate block for the dynamodb:ListStreams action.

AWS Console Deployment

The console deployment creates an IAM role called Amazon_EventBridge_Pipe_Execution_3a86cff7 with two separate policies:

DynamoDbPipeSourceTemplate-50a4ff87

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "dynamodb:DescribeStream",
                "dynamodb:GetRecords",
                "dynamodb:GetShardIterator",
                "dynamodb:ListStreams"
            ],
            "Resource": [
                "arn:aws:dynamodb:eu-central-1:<redacted>:table/ddb-table/stream/2024-10-06T02:54:35.114"
            ]
        }
    ]
}

SqsPipeTargetTemplate-d67b12a4

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "sqs:SendMessage"
            ],
            "Resource": [
                "arn:aws:sqs:eu-central-1:<redacted>:cdk-aws-webhook-momento-dev-DLQ581697C4-WtWzTlrHUaDW"
            ]
        }
    ]
}

Detailed Comparison

Aspect CDK Deployment AWS Console Deployment
IAM Role Name <stack-name>-PipeRole7D4AFC73-pM14iRXveKlj Amazon_EventBridge_Pipe_Execution_3a86cff7
Policies Single policy (PipeRoleDefaultPolicy56E6A74D) Two separate policies (DynamoDbPipeSourceTemplate-50a4ff87 and SqsPipeTargetTemplate-d67b12a4)
DynamoDB Permissions Separate block for dynamodb:ListStreams. Other actions (DescribeStream, GetRecords, GetShardIterator) are grouped together All actions (DescribeStream, GetRecords, GetShardIterator, ListStreams) are in one block
SQS Permissions Includes SendMessage, GetQueueAttributes, and GetQueueUrl actions in the same policy Only the SendMessage action is included in the policy
garysassano commented 2 days ago

Recommended Solution: Custom IAM Role Management

To avoid the inconsistencies in EventBridge Pipes' automatic role management, the recommended approach is to explicitly create and manage your own IAM role. This gives you full control over permissions and ensures predictable behavior as your Pipe configuration evolves.

Example Implementation with CDK

Here's how to properly set up an EventBridge Pipe with explicit IAM role management:

const ddbTable = new TableV2(this, "DDBTable", {
    tableName: "ddb-table",
    partitionKey: {
        name: "Location",
        type: AttributeType.STRING,
    },
    dynamoStream: StreamViewType.NEW_AND_OLD_IMAGES,
    removalPolicy: RemovalPolicy.DESTROY,
});

// Create the main queue and DLQ
const queue = new Queue(this, "Queue", {
    retentionPeriod: Duration.days(14),
});
const dlq = new Queue(this, "DLQ", {
    retentionPeriod: Duration.days(14),
});

// Configure the DynamoDB source with DLQ
const pipeSource = new DynamoDBSource(ddbTable, {
    startingPosition: DynamoDBStartingPosition.LATEST,
    batchSize: 1,
    maximumRetryAttempts: 0,
    deadLetterTarget: dlq,
});

// Create a custom role for the Pipe
const pipeRole = new Role(this, "PipeRole", {
    roleName: "pipe-role",
    assumedBy: new ServicePrincipal("pipes.amazonaws.com"),
});

// Grant necessary permissions
ddbTable.grantStreamRead(pipeRole);
ddbTable.grantTableListStreams(pipeRole);
queue.grantSendMessages(pipeRole);
dlq.grantSendMessages(pipeRole);

// Create the Pipe with the custom role
new Pipe(this, "Pipe", {
    source: pipeSource,
    target: new SqsTarget(queue),
    role: pipeRole,
});

Key Advantages of This Approach:

pahud commented 2 days ago

Just off the top off my head, when deadLetterTarget is specified, pipeline role should be able to send messages to the dlq queue. This means the dlq should grant publish messages to the pipeline principal?

Given deadLetterTarget could be IQueue or ITopic, we should validate and execute the grant* on the pipeline principal.

We need more inputs on this though.

msambol commented 2 days ago

Just off the top off my head, when deadLetterTarget is specified, pipeline role should be able to send messages to the dlq queue. This means the dlq should grant publish messages to the pipeline principal?

Given deadLetterTarget could be IQueue or ITopic, we should validate and execute the grant* on the pipeline principal.

We need more inputs on this though.

This works and grants publish when the DLQ is included when the pipe is initially created. In this case, the DLQ was added after the pipe was already created.

msambol commented 2 days ago

https://github.com/aws/aws-cdk/blob/main/packages/%40aws-cdk/aws-pipes-alpha/lib/source.ts#L134-L143

garysassano commented 2 days ago

To be fair, the issue is also present in the AWS Console. For some reason, the EventBridge team chose to create the IAM role during the initial pipe setup and then stop managing it. However, I believe the CDK team can, and should, do better by managing the role throughout the entire pipe's lifecycle.

msambol commented 2 days ago

+1... I'm open to ideas. It seems we do most role setup in constructors, I'm not quite sure how/if we can implement adding permissions after the fact.

msambol commented 2 days ago

To add some data here. When the pipe is created with the DLQ, sqs:SendMessage permissions are properly added to the pipe role for the DLQ:

  PipeRoleDefaultPolicy56E6A74D:
    Type: AWS::IAM::Policy
    Properties:
      PolicyDocument:
        Statement:
          - Action:
              - dynamodb:DescribeStream
              - dynamodb:GetRecords
              - dynamodb:GetShardIterator
              - dynamodb:ListStreams
            Effect: Allow
            Resource:
              Fn::GetAtt:
                - DDBTable2F2A2F95
                - StreamArn
          - Action:
              - sqs:GetQueueAttributes
              - sqs:GetQueueUrl
              - sqs:SendMessage
            Effect: Allow
            Resource:
              - Fn::GetAtt:
                  - DLQ581697C4
                  - Arn
              - Fn::GetAtt:
                  - Queue4A7E3555
                  - Arn
        Version: "2012-10-17"
      PolicyName: PipeRoleDefaultPolicy56E6A74D
      Roles:
        - Ref: PipeRole7D4AFC73

I tried the flow from @garysassano:

  1. Create the pipe without the DLQ
  2. Add the DLQ

When running cdk synth after adding the DLQ, permissions are added to the role, so the CDK is behaving as it should. Here is the output from cdk diff:

Resources
[~] AWS::IAM::Policy Pipe/Role/DefaultPolicy PipeRoleDefaultPolicy56E6A74D
 └─ [~] PolicyDocument
     └─ [~] .Statement:
         └─ @@ -21,11 +21,19 @@
            [ ]       "sqs:SendMessage"
            [ ]     ],
            [ ]     "Effect": "Allow",
            [-]     "Resource": {
            [-]       "Fn::GetAtt": [
            [-]         "Queue4A7E3555",
            [-]         "Arn"
            [-]       ]
            [-]     }
            [+]     "Resource": [
            [+]       {
            [+]         "Fn::GetAtt": [
            [+]           "DLQ581697C4",
            [+]           "Arn"
            [+]         ]
            [+]       },
            [+]       {
            [+]         "Fn::GetAtt": [
            [+]           "Queue4A7E3555",
            [+]           "Arn"
            [+]         ]
            [+]       }
            [+]     ]
            [ ]   }
            [ ] ]
[~] AWS::Pipes::Pipe Pipe Pipe7793F8A1 may be replaced
 └─ [~] SourceParameters (may cause replacement)
     └─ [~] .DynamoDBStreamParameters:
         └─ [+] Added: .DeadLetterConfig

I think this is a case of CloudFormation not waiting long enough for permissions to be added to the role:

Resource handler returned message: "Resource of type 'AWS::Pipes::Pipe' with identifier 'Pipe7793F8A1-<redacted>' did not stabilize. Status Reason is Input parameter is invalid from the request due to : Error occurred while sending message to SQS queue: arn:aws:sqs:us-east-2:<redacted>:CdkTestStack-DLQ581697C4-<redacted>" (RequestToken: <redacted>, HandlerErrorCode: NotStabilized)