aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.35k stars 3.77k forks source link

StepFunctions: Integration pattern RUN_JOB has no effect on GlueStartJobRun #30735

Closed lauragalera closed 2 days ago

lauragalera commented 3 days ago

Describe the bug

There is a feature in console that permits to execute two jobs sequentially by enabling "Wait for task to complete - optional" in the GlueStartJobRun. I found in a closed issue that the same behavior can be achieved by using the construct property IntegrationPattern.RUN_JOB. However, cdk ignores the property because, after deploying, the resource appears without the sufix .sync.

Expected Behavior

The deployment should result in a state machine with the following code (notice the .sync):

{
  "StartAt": "StartJobIngestionMixpanel",
  "States": {
    "StartJobIngestionMixpanel": {
      "Next": "StartJobRedshiftMixpanel",
      "Retry": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "IntervalSeconds": 300,
          "MaxAttempts": 2,
          "BackoffRate": 1
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "JobFailed"
        }
      ],
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun.sync",
      "Parameters": {
        "JobName": "JOB-00348-DEV-mixpanel-events-to-s3"
      }
    },
    "StartJobRedshiftMixpanel": {
      "End": true,
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun",
      "Parameters": {
        "JobName": "JOB-00351-DEV-mixpanel-events-to-redshift"
      }
    },
    "JobFailed": {
      "Type": "Fail",
      "Error": "Glue job failed after retries"
    }
  }
}

Current Behavior

The resulting code:

{
  "StartAt": "StartJobIngestionMixpanel",
  "States": {
    "StartJobIngestionMixpanel": {
      "Next": "StartJobRedshiftMixpanel",
      "Retry": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "IntervalSeconds": 300,
          "MaxAttempts": 2,
          "BackoffRate": 1
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "JobFailed"
        }
      ],
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun",
      "Parameters": {
        "JobName": "JOB-00348-DEV-mixpanel-events-to-s3"
      }
    },
    "StartJobRedshiftMixpanel": {
      "End": true,
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun",
      "Parameters": {
        "JobName": "JOB-00351-DEV-mixpanel-events-to-redshift"
      }
    },
    "JobFailed": {
      "Type": "Fail",
      "Error": "Glue job failed after retries"
    }
  }
}

Reproduction Steps

Code for replication

       const jobIngestionMixpanel = new GlueStartJobRun(this, 'StartJobIngestionMixpanel', {
            glueJobName: 'JOB-00348-DEV-mixpanel-events-to-s3',
            integretaionPatter: IntegrationPattern.RUN_JOB
        })

        jobIngestionMixpanel.addRetry({
            errors: ['States.TaskFailed'],
            backoffRate: 1,
            maxAttempts: 2,
            interval: Duration.minutes(5)
        })

        jobIngestionMixpanel.addCatch(new Fail(this, 'JobFailed', {
            error: 'Glue job failed after retries'
        }))

        const jobRedshiftMixpanel = new GlueStartJobRun(this, 'StartJobRedshiftMixpanel', {
            glueJobName: 'JOB-00351-DEV-mixpanel-events-to-redshift'
        })

        const definition = jobIngestionMixpanel.next(jobRedshiftMixpanel)

        new StateMachine(this, 'StateMachineJobs', {
            definitionBody: DefinitionBody.fromChainable(definition),
            role: roleStepFunction
        })

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.126.0

Framework Version

No response

Node.js Version

v20.11.0

OS

MacOS 14.2.1

Language

TypeScript

Language Version

No response

Other information

No response

ashishdhingra commented 3 days ago

Run a Job (.sync) specifies that For integrated services such as AWS Batch and Amazon ECS, Step Functions can wait for a request to complete before progressing to the next state. To have Step Functions wait, specify the "Resource" field in your task state definition with the .sync suffix appended after the resource URI..

@lauragalera Good morning. Somehow, I'm unable to reproduce the issue at my end using CDK version 2.147.2. Deploying the below CDK stack:

import * as cdk from 'aws-cdk-lib';
import { DefinitionBody, Fail, IntegrationPattern, StateMachine } from 'aws-cdk-lib/aws-stepfunctions';
import { GlueStartJobRun } from 'aws-cdk-lib/aws-stepfunctions-tasks';
import { Construct } from 'constructs';

export class Issue30735Stack extends cdk.Stack {
  constructor(scope: Construct, id: string, props?: cdk.StackProps) {
    super(scope, id, props);

    const jobIngestionMixpanel = new GlueStartJobRun(this, 'StartJobIngestionMixpanel', {
      glueJobName: 'JOB-00348-DEV-mixpanel-events-to-s3',
      integrationPattern: IntegrationPattern.RUN_JOB
    })

    jobIngestionMixpanel.addRetry({
        errors: ['States.TaskFailed'],
        backoffRate: 1,
        maxAttempts: 2,
        interval: cdk.Duration.minutes(5)
    })

    jobIngestionMixpanel.addCatch(new Fail(this, 'JobFailed', {
        error: 'Glue job failed after retries'
    }))

    const jobRedshiftMixpanel = new GlueStartJobRun(this, 'StartJobRedshiftMixpanel', {
        glueJobName: 'JOB-00351-DEV-mixpanel-events-to-redshift'
    })

    const definition = jobIngestionMixpanel.next(jobRedshiftMixpanel)

    new StateMachine(this, 'StateMachineJobs', {
        definitionBody: DefinitionBody.fromChainable(definition)
    })
  }
}

generated the State Machine job with the below definition (notice the 1st Task has .sync suffix in Resource):

{
  "StartAt": "StartJobIngestionMixpanel",
  "States": {
    "StartJobIngestionMixpanel": {
      "Next": "StartJobRedshiftMixpanel",
      "Retry": [
        {
          "ErrorEquals": [
            "States.TaskFailed"
          ],
          "IntervalSeconds": 300,
          "MaxAttempts": 2,
          "BackoffRate": 1
        }
      ],
      "Catch": [
        {
          "ErrorEquals": [
            "States.ALL"
          ],
          "Next": "JobFailed"
        }
      ],
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun.sync",
      "Parameters": {
        "JobName": "JOB-00348-DEV-mixpanel-events-to-s3"
      }
    },
    "StartJobRedshiftMixpanel": {
      "End": true,
      "Type": "Task",
      "Resource": "arn:aws:states:::glue:startJobRun",
      "Parameters": {
        "JobName": "JOB-00351-DEV-mixpanel-events-to-redshift"
      }
    },
    "JobFailed": {
      "Type": "Fail",
      "Error": "Glue job failed after retries"
    }
  }
}

Please try using the latest aws-cdk-lib package (and preferably latest CDK CLI version) and confirm if the issue is resolved.

Thanks, Ashish

lauragalera commented 2 days ago

Hello @ashishdhingra,

Indeed, changing to version 2.147.3 solved it.

Thank you

github-actions[bot] commented 2 days ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.