aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.5k stars 3.84k forks source link

stepfunctions: using `itemProcessor` with `mode = DISTRIBUTED` doesn't work out of the box due to permission error #28820

Open akerra6993 opened 7 months ago

akerra6993 commented 7 months ago

Describe the bug

Deploying a map state in a state machine using distributed processing mode (and standard execution type for the child executions) causes an IAM permissions issue since the parent state machine role doesn't have permission to start executions on itself. Trying to grant permissions via stateMachine.grantStartExecution(stateMachine) causes a circular dependency.

Expected Behavior

When using distributed processing mode, necessary permissions should be generated by default.

Current Behavior

Start execution permission for the child executions is not granted to the parent state machine.

Reproduction Steps

const mapState = new Map(this, 'Map State', {
  itemsPath: JsonPath.stringAt('$...'),
  maxConcurrency: 100,
   parameters: {
    ...
   }
})
mapListings.itemProcessor(..., {
  executionType: ProcessorType.STANDARD,
  mode: ProcessorMode.DISTRIBUTED
})

Possible Solution

Automatically add the necessary IAM policy to the parent state machine's default role

Additional Information/Context

No response

CDK CLI Version

2.122.0 (build 7e77e02)

Framework Version

No response

Node.js Version

v18.16.1

OS

MacOS Sonoma 14.0 (M2 Pro)

Language

TypeScript

Language Version

No response

Other information

technically I am using vanilla JS CDK language but that's not an option in the language dropdown.

pahud commented 7 months ago

Thank you. Can you share the full error messages?

akerra6993 commented 7 months ago
Error contacting AWS Service. | Message from Service: User: arn:aws:sts::{my account id}:assumed-role/{the state machine default role} is not authorized to perform: states:StartExecution on resource: arn:aws:states:us-west-2:{my account id}:stateMachine:{state machine id} because no identity-based policy allows the states:StartExecution action (Service: Sfn, Status Code: 400, Request ID: 5891e970-2bf1-4e15-9b06-f6f631b010b5)
rogerchi commented 7 months ago

This should work in the meantime:

    const policy = new Policy(this, 'sfn-map-policy', {
      document: new PolicyDocument({
        statements: [new PolicyStatement({ resources: [machine.stateMachineArn], actions: ['states:StartExecution'] })],
      }),
    })

    policy.attachToRole(machine.role)
abdelnn commented 6 months ago

The new Distributed Map construct should also work - #28821

anentropic commented 4 months ago

I have this issue and am using a DistributedMap state

I attempted this:

        self.state_machine.add_to_role_policy(
            iam.PolicyStatement(
                actions=["states:StartExecution"],
                resources=[self.state_machine.state_machine_arn],
            ),
        )

But I get FAILED, Circular dependency between resources: [StateMachineB23A416F, StateMachineRoleDefaultPolicyD3EF01D8]

anentropic commented 4 months ago

...but the form given by @rogerchi does work instead

        policy = iam.Policy(
            self,
            "sfn-map-policy",
            document=iam.PolicyDocument(
                statements=[
                    iam.PolicyStatement(
                        resources=[self.state_machine.state_machine_arn],
                        actions=["states:StartExecution"],
                    ),
                    iam.PolicyStatement(
                        resources=[
                            f"arn:aws:states:*:{Aws.ACCOUNT_ID}:execution:{self.state_machine.state_machine_name}/*"
                        ],
                        actions=["states:RedriveExecution"],
                    ),
                ],
            ),
        )
        policy.attach_to_role(self.state_machine.role)

I had to add another missing permission, to allow re-driving failed distributed map run. Maybe there are other missing perms that I haven't run into yet.

Anyway, the point is that DistributedMap state has not set up the permissions like it ought to

bilalq commented 4 months ago

Relevant docs: https://docs.aws.amazon.com/step-functions/latest/dg/iam-policies-eg-dist-map.html

Seems like at a minimum, you want:

Also, if you have a resultWriter S3 bucket, you'll need all the various permissions mentioned in the doc above for the bucket.

bilalq commented 4 months ago

I see that the PR that added the DistributedMap construct did seem to set permissions other than RedriveExecution in the bind method of the state graph packages/aws-cdk-lib/aws-stepfunctions/lib/state-graph.ts:

  /**
   * Binds this StateGraph to the StateMachine it defines and updates state machine permissions
   */
  public bind(stateMachine: StateMachine) {
    for (const state of this.allStates) {
      if (DistributedMap.isDistributedMap(state)) {
        stateMachine.role.attachInlinePolicy(new iam.Policy(stateMachine, 'DistributedMapPolicy', {
          document: new iam.PolicyDocument({
            statements: [
              new iam.PolicyStatement({
                actions: ['states:StartExecution'],
                resources: [stateMachine.stateMachineArn],
              }),
              new iam.PolicyStatement({
                actions: ['states:DescribeExecution', 'states:StopExecution'],
                resources: [`${stateMachine.stateMachineArn}:*`],
              }),
            ],
          }),
        }));

        break;
      }
    }
  }

But I'm still hitting errors like the following at runtime:

Error contacting AWS Service. | Message from Service: User: arn:aws:sts::123456789012:assumed-role/ExampleStateMachineRole-w9L0WPmFgXQU/KFFycMGpPUoVXQJNEKPZfzjTqKAbOZlA is not authorized to perform: states:StartExecution on resource: arn:aws:states:us-east-2:123456789012:stateMachine:Example because no identity-based policy allows the states:StartExecution action (Service: Sfn, Status Code: 400, Request ID: efa7d0da-0d5f-4359-bd3c-844ede092da5)

I have no resultWriter or itemReader in my state task here. Would that maybe affect things?

cc @abdelnn