Closed psolaimani closed 4 years ago
@psolaimani - The Request Response integration pattern should not have any suffix in the Resource as it appears to be based on your template. Keep in mind that request response means that your state machine will call the API and then move on when it gets a response (not necessarily when the job completes running).
If you want your state machine to wait until the Glue job itself finishes running, then you would want to use RUN_JOB
. In this case, the resource would have a suffix of .sync
.
it sounds like the integration pattern you want to use is RUN_JOB
. Have you tried that and found that it has no effect?
A minimal repro that captures the expected behaviour as well as the observed behaviour would help.
marking this as a guidance
issue for now - can add the bug
label back when we have repro steps.
This issue has not received a response in a while. If you want to keep this issue open, please leave a comment below and auto-close will be canceled.
Greetings, I deployed this solution in my dev environment and was wondering how I may validate that the task is synchronous?
As you may know, Step Functions is an orchestration service, if a use-case involves orchestrating/managing multiple tasks in sequence or in parallel, Step Functions could help with that.
In this scenario, we will start a Glue job and will periodically check on the job status by polling Glue. After the job is done we will continue on to the next step as specified in the state machine definition. It will wait for Glue to finish the job run before proceeding to the next state in the state machine.
After delving through documentation, I read that it can be possible in this use-case through .sync integration Using CDK, it can be specified through "IntegrationPattern" of "RUN_JOB" to make that state synchronous.
I've added the integration pattern RUN_JOB
, but it doesn't appear with the suffix .sync
when I check it on console.
Here is how I set up my state machine:
const jobIngestionMixpanel = new GlueStartJobRun(this, 'StartJobIngestionMixpanel', {
glueJobName: 'job1',
integretaionPatter: IntegrationPattern.RUN_JOB
})
jobIngestionMixpanel.addRetry({
errors: ['States.TaskFailed'],
backoffRate: 1,
maxAttempts: 2,
interval: Duration.minutes(5)
})
jobIngestionMixpanel.addCatch(new Fail(this, 'JobFailed', {
error: 'Glue job failed after retries'
}))
const jobDictionaryMixpanel = new GlueStartJobRun(this, 'StartJobDictMixpanel', {
glueJobName: 'job2',
integretaionPatter: IntegrationPattern.RUN_JOB
})
const definition = jobIngestionMixpanel.next(jobDictionaryMixpanel)
new StateMachine(this, 'StateMachineJobs', {
definitionBody: DefinitionBody.fromChainable(definition),
})
And here the cloudformation template
"StateMachineJobsB3D6E122": {
"Type": "AWS::StepFunctions::StateMachine",
"Properties": {
"DefinitionString": {
"Fn::Join": [
"",
[
"{\"StartAt\":\"StartJobIngestionMixpanel\",\"States\":{\"StartJobIngestionMixpanel\":{\"Next\":\"StartJobDictMixpanel\",\"Retry\":[{\"ErrorEquals\":[\"States.TaskFailed\"],\"IntervalSeconds\":300,\"MaxAttempts\":2,\"BackoffRate\":1}],\"Catch\":[{\"ErrorEquals\":[\"States.ALL\"],\"Next\":\"JobFailed\"}],\"Type\":\"Task\",\"Resource\":\"arn:",
{
"Ref": "AWS::Partition"
},
":states:::glue:startJobRun\",\"Parameters\":{\"JobName\":\"job2\"}},\"StartJobDictMixpanel\":{\"End\":true,\"Type\":\"Task\",\"Resource\":\"arn:",
{
"Ref": "AWS::Partition"
},
":states:::glue:startJobRun\",\"Parameters\":{\"JobName\":\"job1\"}},\"JobFailed\":{\"Type\":\"Fail\",\"Error\":\"Glue job failed after retries\"}}}"
]
]
}
}}
I did a test creating the state machine on console directly before cdk, and the jobs ran synchronously if I added the .sync suffix, but I've tried RUN_JOB in cdk and it doesn't work.
My desired state machine:
{
"Comment": "A state machine that starts a Glue job, retries upon failure, and proceeds differently upon success",
"StartAt": "GlueStartJobRun",
"States": {
"GlueStartJobRun": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "job1"
},
"Retry": [
{
"ErrorEquals": [
"States.TaskFailed"
],
"IntervalSeconds": 300,
"MaxAttempts": 2,
"BackoffRate": 1
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "JobFailed"
}
],
"Next": "Glue StartJobRun"
},
"Glue StartJobRun": {
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun.sync",
"Parameters": {
"JobName": "job2"
},
"End": true
},
"JobFailed": {
"Type": "Fail",
"Error": "Glue job failed after retries."
}
}
}
What the above cdk code reproduced:
{
"StartAt": "StartJobIngestionMixpanel",
"States": {
"StartJobIngestionMixpanel": {
"Next": "StartJobDictMixpanel",
"Retry": [
{
"ErrorEquals": [
"States.TaskFailed"
],
"IntervalSeconds": 300,
"MaxAttempts": 2,
"BackoffRate": 1
}
],
"Catch": [
{
"ErrorEquals": [
"States.ALL"
],
"Next": "JobFailed"
}
],
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun",
"Parameters": {
"JobName": "job1"
}
},
"StartJobDictMixpanel": {
"End": true,
"Type": "Task",
"Resource": "arn:aws:states:::glue:startJobRun",
"Parameters": {
"JobName": "job2"
}
},
"JobFailed": {
"Type": "Fail",
"Error": "Glue job failed after retries"
}
}
}
when creating a step function where you need to run Glue jobs sequentially, the integration_pattern argument is ignored. This was the case with multiple cdk versions including 1.45.0 (current latest)
Reproduction Steps
Error Log
relevant part of CloudFormation yaml
Environment
Other
Same happens in CodeBuild (amazonlinux2-x86_64-standard:3.0, Python 3.7, CDK 1.45.0)
This is :bug: Bug Report