aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.65k stars 3.91k forks source link

step function: Fail state cause_path generates error using States.Format #30063

Closed kimyx closed 5 months ago

kimyx commented 5 months ago

Describe the bug

The AWS documentation for Fail state says that the CausePath argument accepts the use of intrinsic functions as well as reference paths. The CDK support for the Fail state works with reference paths but not with intrinsic functions, particularly States.Format(). It generates this error:

RuntimeError: Expected JSON path to start with '$', got: States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)

Expected Behavior

I expected to be able to use States.Format when building a Fail state.

Current Behavior

$ cdk deploy  --exclusively JobPollerStack
...
jsii.errors.JavaScriptError:
  Error: Expected JSON path to start with '$', got: States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)
      at renderJsonPath (/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/jsii-kernel-3Ir8Er/node_modules/aws-cdk-lib/aws-stepfunctions/lib/states/state.js:1:9755)
      at Fail.toStateJson (/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/jsii-kernel-3Ir8Er/node_modules/aws-cdk-lib/aws-stepfunctions/lib/states/fail.js:1:1050)
      at StateGraph.toGraphJson (/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/jsii-kernel-3Ir8Er/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-graph.js:1:2159)
      at ChainDefinitionBody.bind (/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/jsii-kernel-3Ir8Er/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-machine.js:1:12008)
      at new StateMachine (/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/jsii-kernel-3Ir8Er/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-machine.js:1:6368)
      at Kernel._Kernel_create (/private/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/tmpv90l1_x_/lib/program.js:10108:25)
      at Kernel.create (/private/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/tmpv90l1_x_/lib/program.js:9779:93)
      at KernelHost.processRequest (/private/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/tmpv90l1_x_/lib/program.js:11696:36)
      at KernelHost.run (/private/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/tmpv90l1_x_/lib/program.js:11656:22)
      at Immediate._onImmediate (/private/var/folders/_b/2b2x6djs2q77mc8d8pt49r0m007nfk/T/tmpv90l1_x_/lib/program.js:11657:46)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kiko1739/code/csds/src/csds_app.py", line 682, in <module>
    job_poller_stack = JobPollerStack(app, 'BadStack')
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kiko1739/code/csds/venv_311/lib/python3.11/site-packages/jsii/_runtime.py", line 118, in __call__
    inst = super(JSIIMeta, cast(JSIIMeta, cls)).__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kiko1739/code/csds/src/stacks/step_bad_stack.py", line 85, in __init__
    sm = _aws_stepfunctions.StateMachine(
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kiko1739/code/csds/venv_311/lib/python3.11/site-packages/jsii/_runtime.py", line 118, in __call__
    inst = super(JSIIMeta, cast(JSIIMeta, cls)).__call__(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kiko1739/code/csds/venv_311/lib/python3.11/site-packages/aws_cdk/aws_stepfunctions/__init__.py", line 10097, in __init__
    jsii.create(self.__class__, self, [scope, id, props])
  File "/Users/kiko1739/code/csds/venv_311/lib/python3.11/site-packages/jsii/_kernel/__init__.py", line 334, in create
    response = self.provider.create(
               ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kiko1739/code/csds/venv_311/lib/python3.11/site-packages/jsii/_kernel/providers/process.py", line 365, in create
    return self._process.send(request, CreateResponse)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/kiko1739/code/csds/venv_311/lib/python3.11/site-packages/jsii/_kernel/providers/process.py", line 342, in send
    raise RuntimeError(resp.error) from JavaScriptError(resp.stack)
RuntimeError: Expected JSON path to start with '$', got: States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)

Subprocess exited with error 1

Reproduction Steps

# adapted from https://github.com/aws-samples/aws-cdk-examples/blob/main/python/stepfunctions/stepfunctions/stepfunctions_stack.py

from aws_cdk import (
    aws_stepfunctions as _aws_stepfunctions,
    App, Duration, Stack
)

class JobPollerStack(Stack):
    def __init__(self, app: App, id: str, **kwargs) -> None:
        super().__init__(app, id, **kwargs)

        # Step functions Definition

        submit_job = _aws_stepfunctions.Wait(
            self, "Submit Job",
            time=_aws_stepfunctions.WaitTime.duration(
                Duration.seconds(30))
        )

        wait_job = _aws_stepfunctions.Wait(
            self, "Wait 30 Seconds",
            time=_aws_stepfunctions.WaitTime.duration(
                Duration.seconds(30))
        )

        status_job = _aws_stepfunctions.Wait(
            self, "Get Status",
            time=_aws_stepfunctions.WaitTime.duration(
                Duration.seconds(30))
        )

        error_path = "$.Cause.Attempts[0].StatusReason"
        cause_path = "States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)"
        fail_job = _aws_stepfunctions.Fail(
            self, "Fail",
            error_path=error_path,  # works
            cause_path=cause_path  # fails
        )

        succeed_job = _aws_stepfunctions.Succeed(
            self, "Succeeded",
            comment='AWS Batch Job succeeded'
        )

        # Create Chain

        chain = submit_job.next(wait_job) \
            .next(status_job) \
            .next(_aws_stepfunctions.Choice(self, 'Job Complete?')
                  .when(_aws_stepfunctions.Condition.string_equals('$.status', 'FAILED'), fail_job)
                  .when(_aws_stepfunctions.Condition.string_equals('$.status', 'SUCCEEDED'), succeed_job)
                  .otherwise(wait_job))

        # Create state machine
        sm = _aws_stepfunctions.StateMachine(
            self, "StateMachine",
            definition_body=_aws_stepfunctions.DefinitionBody.from_chainable(chain),
            timeout=Duration.minutes(5),
        )

# in app.py:
# job_poller_stack = JobPollerStack(app, 'BadStack')

error_path works as given, cause_path doesn't.

Possible Solution

I'm guessing that cdk simply doesn't implement this AWS feature yet, since CausePath and ErrorPath were implemented only about 9/2023. If so, please consider this a vote for supporting it.

Additional Information/Context

No response

CDK CLI Version

CDK 2.140.0 (build 46168aa)

Framework Version

No response

Node.js Version

v20.12.1

OS

ProductName: MacOS ProductVersion: 14.3.1 BuildVersion: 23D60

Language

Python

Language Version

Python (3.11.9)

Other information

No response

kimyx commented 5 months ago

We can probably work around this issue by inserting a Pass state before the Fail state. The Pass state selects fields and formats error/cause strings to the output that becomes input to the Fail state, which can then use a simple selection to get what it needs. Will try it soon.

kimyx commented 5 months ago

I should say, it doesn't work using the cause keyword either. With this change:

        fail_job = _aws_stepfunctions.Fail(
            self, "Fail",
            error_path=error_path,  # works
            cause=cause_path  # fails
        )

The cdk deploy command oddly fails like this:

No stacks match the name(s) JobPollerStack

In my full code, it seems to deploy, but it fails when actually running the step function.

kimyx commented 5 months ago

This now works in my real app:

        error_parameters = {
            "Cause.$": "States.StringToJson($.Cause)",
        }
        # this state converts escaped json into a json object suitable for selections in the next state
        error_state = sfn.Pass(self, f'ConvertErrorCause_{error_task_identifier}', 
                                               parameters=error_parameters)

        format_parameters = {
            "ErrorMsg.$": "States.Format('Step Function Fail: {}', $.Cause.Attempts[0].StatusReason)",
            "CauseMsg.$": "States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)",
        }
        # this state formats error and cause strings for the following states
        format_state = sfn.Pass(self, f'FormatErrorCause_{error_task_identifier}', 
                                                  parameters=format_parameters)

        subject = sfn.JsonPath.string_at("$.ErrorMsg")
        message = sfn.TaskInput.from_text(sfn.JsonPath.string_at("$.CauseMsg"))

        # SNS publish error message
        failed_job_sns_topic = tasks.SnsPublish(self,
                                                f'FailedJobMessage_{error_task_identifier}',
                                                topic=sns_topic,
                                                subject=subject,
                                                message=message,
                                                result_path="$.result"
                                                )

        error_path = "$.ErrorMsg"
        cause_path = "$.CauseMsg"
        fail_state = sfn.Fail(self, f"FailStepFunction_{error_task_identifier}",
                              error_path=error_path,
                              cause_path=cause_path
                              )

        map_state.add_catch(error_state.next(format_state).next(failed_job_sns_topic).next(fail_state))
ashishdhingra commented 5 months ago

@kimyx Good afternoon. I'm unsure if this issue is specific to CDK using Python. I tried reproducing the issue using both TypeScript and Python, they both produce the same error.

TypeScript ## CDK stack ```TypeScript import * as cdk from 'aws-cdk-lib'; import { Construct } from 'constructs'; import * as stepfunctions from 'aws-cdk-lib/aws-stepfunctions'; export class TypescriptStack extends cdk.Stack { constructor(scope: Construct, id: string, props?: cdk.StackProps) { super(scope, id, props); const submit_job = new stepfunctions.Wait(this, "Submit Job", { time: stepfunctions.WaitTime.duration(cdk.Duration.seconds(30)) }); const wait_job = new stepfunctions.Wait(this, "Wait 30 Seconds", { time: stepfunctions.WaitTime.duration(cdk.Duration.seconds(30)) }); const status_job = new stepfunctions.Wait(this, "Get Status", { time: stepfunctions.WaitTime.duration(cdk.Duration.seconds(30)) }); const error_path = "$.Cause.Attempts[0].StatusReason" const cause_path = "States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)" const fail_job = new stepfunctions.Fail(this, "Fail", { errorPath: error_path, causePath: cause_path }); const succeed_job = new stepfunctions.Succeed(this, "Succeeded", { comment: "AWS Batch Job succeeded" }); // Create Chain const chain = submit_job.next(wait_job) .next(status_job) .next(new stepfunctions.Choice(this, 'Job Complete?') .when(stepfunctions.Condition.stringEquals('$.status', 'FAILED'), fail_job) .when(stepfunctions.Condition.stringEquals('$.status', 'SUCCEEDED'), succeed_job) .otherwise(wait_job)); // Create state machine const sm = new stepfunctions.StateMachine(this, "StateMachine", { definitionBody: stepfunctions.DefinitionBody.fromChainable(chain), timeout: cdk.Duration.minutes(5) }); } } ``` ## `cdk synth` error ``` Error: Expected JSON path to start with '$', got: States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName) at renderJsonPath (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/node_modules/aws-cdk-lib/aws-stepfunctions/lib/states/state.js:1:9755) at Fail.toStateJson (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/node_modules/aws-cdk-lib/aws-stepfunctions/lib/states/fail.js:1:1050) at StateGraph.toGraphJson (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-graph.js:1:2159) at ChainDefinitionBody.bind (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-machine.js:1:12008) at new StateMachine (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-machine.js:1:6368) at new TypescriptStack (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/lib/typescript-stack.ts:41:16) at Object. (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/bin/typescript.ts:7:1) at Module._compile (node:internal/modules/cjs/loader:1376:14) at Module.m._compile (/Users/ashdhin/dev/repros/cdk/issue30063_stepfunction/typescript/node_modules/ts-node/src/index.ts:1618:23) at Module._extensions..js (node:internal/modules/cjs/loader:1435:10) ```
Python ## CDK stack ```Python from aws_cdk import ( Duration, Stack, aws_stepfunctions as stepfunctions, ) from constructs import Construct class PythonStack(Stack): def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None: super().__init__(scope, construct_id, **kwargs) submit_job = stepfunctions.Wait( self, "Submit Job", time=stepfunctions.WaitTime.duration( Duration.seconds(30)) ) wait_job = stepfunctions.Wait( self, "Wait 30 Seconds", time=stepfunctions.WaitTime.duration( Duration.seconds(30)) ) status_job = stepfunctions.Wait( self, "Get Status", time=stepfunctions.WaitTime.duration( Duration.seconds(30)) ) error_path = "$.Cause.Attempts[0].StatusReason" cause_path = "States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName)" fail_job = stepfunctions.Fail( self, "Fail", error_path=error_path, # works cause_path=cause_path # fails ) succeed_job = stepfunctions.Succeed( self, "Succeeded", comment='AWS Batch Job succeeded' ) # Create Chain chain = submit_job.next(wait_job) \ .next(status_job) \ .next(stepfunctions.Choice(self, 'Job Complete?') .when(stepfunctions.Condition.string_equals('$.status', 'FAILED'), fail_job) .when(stepfunctions.Condition.string_equals('$.status', 'SUCCEEDED'), succeed_job) .otherwise(wait_job)) # Create state machine sm = stepfunctions.StateMachine( self, "StateMachine", definition_body=stepfunctions.DefinitionBody.from_chainable(chain), timeout=Duration.minutes(5), ) ``` ## `cdk synth` error ``` jsii.errors.JavaScriptError: Error: Expected JSON path to start with '$', got: States.Format('LogStreamName: {}', $.Cause.Container.LogStreamName) at renderJsonPath (/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/jsii-kernel-fWOjiH/node_modules/aws-cdk-lib/aws-stepfunctions/lib/states/state.js:1:9755) at Fail.toStateJson (/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/jsii-kernel-fWOjiH/node_modules/aws-cdk-lib/aws-stepfunctions/lib/states/fail.js:1:1050) at StateGraph.toGraphJson (/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/jsii-kernel-fWOjiH/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-graph.js:1:2159) at ChainDefinitionBody.bind (/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/jsii-kernel-fWOjiH/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-machine.js:1:12008) at new StateMachine (/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/jsii-kernel-fWOjiH/node_modules/aws-cdk-lib/aws-stepfunctions/lib/state-machine.js:1:6368) at Kernel._Kernel_create (/private/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/tmpadel18q6/lib/program.js:10119:25) at Kernel.create (/private/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/tmpadel18q6/lib/program.js:9790:93) at KernelHost.processRequest (/private/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/tmpadel18q6/lib/program.js:11707:36) at KernelHost.run (/private/var/folders/r5/964t6ckn7jl87krdykn_3hrm0000gr/T/tmpadel18q6/lib/program.js:11667:22) ```

@kimyx Please advise on how you were able to come up with expression $.Cause.Container.LogStreamName.

Thanks, Ashish

kimyx commented 5 months ago

Thanks for confirming, Ashish.

The LogStreamName expression comes from my real app, which uses a step function to start Batch jobs, each of which produces a log stream. I didn't know a good expression for the sample app, but it failed with the same error my real app was getting. Once the initial error is fixed, let me know if you want help finding an applicable expression to test.

ashishdhingra commented 5 months ago

Looks like as reported in the issue description, per StepFunctions: States: Fail documentation, using CausePath should support an intrinsic function that returns a string. However, it's returning the mentioned error, perhaps here.

sakurai-ryo commented 5 months ago

I am working on this issue.

github-actions[bot] commented 5 months ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

aws-cdk-automation commented 3 months ago

Comments on closed issues and PRs are hard for our team to see. If you need help, please open a new issue that references this one.