aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
285 stars 87 forks source link

Retry policies with error codes #191

Closed rodrick10 closed 1 year ago

rodrick10 commented 1 year ago

Currently, stepfunctions retry policies only check the error name. If the error name is vague and we need to check its cause, there is no way to do that.

Use Case

Below you can see an example with a sagemaker training job. image

In this example, I want to retry only if there is a "ThrottlingException". I cannot do it because the retry policy only looks at the error name, in this case, "SageMaker.AmazonSageMakerException".

Proposed Solution

Improve retry and catch policies to also work with the error cause.

This is a :rocket: Feature Request

wong-a commented 1 year ago

Closing as this is outside the scope of the Step Functions Data Science SDK. Retrier semantics is part of Amazon States Language.