Open sabarnac opened 4 years ago
I would consider this an edge-case to some extent, since I wouldn't think a lot of people would have such a use-case.
There is a hacky workaround for those who want it:
const UNSAFE__removeBadDefaultPolicyDocumentStatement = (stateMachine: StateMachine) => {
const stateMachineDefaultPolicyDocument = (stateMachine.role as any).defaultPolicy.document;
stateMachineDefaultPolicyDocument.statements = stateMachineDefaultPolicyDocument.statements.filter(
(statement: any) => statement.action.indexOf("states:StartExecution") === -1,
);
};
You can pass in your own role when creating the state machine with a proper policy document statement, and use this function to remove the bad default one.
I've committed the bug workaround code to the reproduction repo under branch bug-workaround
@sabarnac - Thanks for filing the issue.
What would you expect the Resource
property of the IAM policy to contain in such a case?
Looking at your sample code once again, I think the issue is that the API, specifically StateMachine.fromStateMachineArn()
, was never designed for dynamic state machine ARNs. All of CDK's import APIs (fromXXXArn()
) are designed for ARNs being static at the time of CloudFormation deployment.
For your use case, I would suggest to use an InvokeFunction
step function task to invoke a lambda function that calls StepFunction's StartExecution API using the AWS SDK. With this you will have control over what permissions need to be attached to the Lambda function.
Alternatively, you can extend your own IStepFunctionsTask
similar to the model of StartExecution
and implement your logic there.
I would like to turn this into a feature request for the CDK to allow the StartExecution task to support dynamic ARNs.
The fact that when providing a dynamic value to StateMachine.fromStateMachineArn()
resulted in the ExecutionName
field being appended with .$
made me assume that dynamic ARN support was intended (to a certain extent). If this was unintended, then it makes a lot more sense why the policy statement bug exists.
The issue with having a Lambda function manage calling nested workflows is that the nested workflow needs to be executed synchronously, and can take an arbitrary amount of time to execute. Having a lambda start a new workflow, check its status, and then return a response to continue the workflow execution feels unnecessary to implement since Step Functions can do this for you instead, and can be provided the same permissions as I would have to provide to the Lambda anyways.
In my use-case, the nested workflows cannot notify the parent workflow about their progress and completion and cannot be modified in any way either. They are written to be a completely independent, and should have no idea about any orchestration that goes around it (if any workflows have started it, whether it should notify/start any workflows on completion, etc).
We've already figured out the permissions we would need to provide to let Step Functions call the nested workflows, so we can define the policies separately and attach it to the Step Functions state machine.
The suggestion for extending IStepFunctionsTask
and adding functionality there is a good point, and will look into it. I'll add it to my reproduction repo if I get around to implementing it. Are there any guides or references I could use to help with this?
I think dynamic ARN support in CDK would be a pretty useful feature (especially since Step Functions supports it, or at least supports the use-case I have shown. Tested it through the AWS console), so I agree with turning this into a feature request.
As for what should be set for Resource
, I think in the cases of dynamic ARNs, the states:StartExecution
policy statement should not be created at all. Since the ARN is dynamic, the only possible value you can set for Resource
is *
, or at the very least some resource value/policy statement that restricts the ability to start executions to within the same AWS account + region as the workflow itself. I wouldn't really consider this the safest option.
The safest option is to expect the user to provide the correct policy statement instead. The user who is creating such a workflow would have the knowledge of the state machines that this workflow would end up executing, and so should provide the policy statement that provides the ability to call them.
However, if you need to have some sort of default policy statement kept in case the user doesn't provide their own policy statement, then having the resource to something like arn:aws:states:${AWS::Region}:${AWS::Account}:stateMachine:*
so that permissions are limited to within the same account + region should be ok? However, it should be removed if the user does provide their own states:StartExecution
policy statement. I would still prefer that this not be done to be honest.
The suggestion for extending IStepFunctionsTask and adding functionality there is a good point, and will look into it. I'll add it to my reproduction repo if I get around to implementing it. Are there any guides or references I could use to help with this?
The aws-stepfunctions-tasks
module should be a good one.
Thanks for your thoughts on the policy and your use case. I'm switching this issue from being a bug to a feature request!
I also experience this, but the workaround doesn't appear to work for Python, namely nothing involving default policy attribute on the state machine's role.
Furthermore, when trying to circumvent this by providing the role directly, the documentation for the role parameter says default behavior is to create a role. However, even when providing the role directly, it still creates the default policy, with the wrong resource, and attaches it to the role provided:
MyStateMachineRoleDefaultPolicy1A388074:
Type: AWS::IAM::Policy
Properties:
PolicyDocument:
Statement:
- Action: states:StartExecution
Effect: Allow
Resource: $.stateMachineArn # <- Fail.
Version: "2012-10-17"
PolicyName: MyStateMachineRoleDefaultPolicy1A388074
Any updates? Any workarounds about creating state machine which calls another one? How generate "StateMachineArn.$" ? Like that:
{
"Type":"Task",
"Resource":"arn:aws:states:::states:startExecution.sync",
"Parameters":{
"Input":{
"Comment": "Hello world!",
"AWS_STEP_FUNCTIONS_STARTED_BY_EXECUTION_ID.$": "$$.Execution.Id"
},
"StateMachineArn.$":"$.arn",
},
"End":true
}
I used the following code to remove the default policy statements that the Step Function CDK generates for the state machine when providing a dynamic ARN.
private readonly policyActionsToFilter = [
"states:StartExecution",
"states:DescribeExecution",
"states:StopExecution",
]
private UNSAFE__hackStateMachineDefaultPolicyDocument = (stateMachine: StateMachine) => {
const stateMachineDefaultPolicyDocument = (stateMachine.role as any).defaultPolicy.document;
stateMachineDefaultPolicyDocument.statements = stateMachineDefaultPolicyDocument.statements.filter(
(statement: any) => this.policyActionsToFilter.every((action) => !statement.action.includes(action)),
);
};
After this, then you can add in your own policy statement for what other state machine ARNs the current one is allowed to call. Make sure to provide policy statements for all the actions that were removed.
@sabarnac I don't believe that works in Python world.
Changes are needed to the workaround from @sabarnac to work with CDK 2.32.0 (action is now _action, and is of type OrderedSet so we need to access the internal array with _action.direct()
to do the .includes()
check).
private readonly policyActionsToFilter = [
"states:StartExecution",
"states:DescribeExecution",
"states:StopExecution",
]
private UNSAFE__hackStateMachineDefaultPolicyDocument = (stateMachine: StateMachine) => {
const stateMachineDefaultPolicyDocument = (stateMachine.role as any).defaultPolicy.document;
stateMachineDefaultPolicyDocument.statements = stateMachineDefaultPolicyDocument.statements.filter(
(statement: any) => this.policyActionsToFilter.every((action) => !statement._action.direct().includes(action)),
);
};
Description
Note: Bug occurs with the experimental library
@aws-cdk/aws-stepfunctions
When creating a state machine with a nested workflow task, whose StateMachineArn is a dynamic value set using a JSONPath query, the stepfunctions CDK library recognizes the nested workflow task ARN as dynamic and suffixes
StateMachineArn
with.$
correctly.However, the library doesn't realize that the ARN provided is a dynamic value when generating default policies for the state machine. As a result, it generates a policy statement for the state machine with the action "states:StartExecution" so that the state machine can execute nested workflows (which is a good thing), but sets the resource field to the dynamic value (which is a bad thing).
Reproduction Steps
Reproduction Repo: https://github.com/sabarnac/cdk-stepfunctions-bug-repo
Steps:
npm run cdk synth
/cdk synth
./cdk.out/CdkStepfunctionsBugRepoStack.template.json
SampleStateMachineRoleDefaultPolicy
Stack Code:
Error Log
None.
Environment
@aws-cdk/aws-stepfunctions-tasks
: 1.22.0@aws-cdk/aws-stepfunctions
: 1.22.0@aws-cdk/core
: 1.22.0Other
No other details
This is :bug: Bug Report