Open zexuan-zhou opened 3 years ago
Hi @zexuan-zhou,
Unfortunately, AWS Step Functions doesn't provide any native capability to restart a failed execution. However, theoretically, it's feasible to create a parameterized flow (which takes in run_id
and step
) and copies over the state from run_id
for all steps prior to step
and execute the remainder of the flow as usual.
Hi @savingoyal Thank you. Would you mind explaining a bit more? Are you saying that I should create another Metaflow flow and deploy it to AWS step that simply just copies a failed flow and restart it? I think I can understand the logic but I couldn't image how that flow will look like.
@zexuan-zhou Apologies for the delayed response. It depends on your exact use case, by you can access values from previous executions and assign those to your step using the client API. It will be a bit clunky tbh.
Hi @zexuan-zhou , how to resume a job (that is run by AWS step function) from a failed step using CLI command. Could you share me the command?
Start -> Activity1 -> Activity2 -> Activity3 -> Activity4 -> Stop
When an execution fails during some activity, let's say Activity2, the execution is marked as failure.
Now, is there anyway to resume this failed execution from the activity(Activity2) during which it failed earlier?
you can look at this https://aws.amazon.com/blogs/compute/resume-aws-step-functions-from-any-state/
it will resume from the last successful state. Note that if you have a activity 2 was a parallel job and 1 failed out of eg 100, then all 100 will still be redone.
here's more discussion re: resuming production runs https://outerbounds-community.slack.com/archives/C02116BBNTU/p1675706066639959?thread_ts=1673332947.917959&cid=C02116BBNTU
I know we can resume a job (that is run by AWS step function) from a failed step by cli command on local machines. But is there a ways to directly resume a failed step on AWS step functions without going through local machine cli command?