aws / aws-step-functions-data-science-sdk-python

Step Functions Data Science SDK for building machine learning (ML) workflows and pipelines on AWS
Apache License 2.0
285 stars 87 forks source link

Support "Cycles" with Choice states #187

Closed cshumaker-irb closed 1 year ago

cshumaker-irb commented 2 years ago

I'm trying to iterate over a set of partitions using Glue's GetPartitions call. The first response may or may not have a token to continue pagination. My workflow looks like this:

  1. Call GetPartitions
  2. Map State
  3. Do mapped actions in parallel
  4. Check if the token is in state input
  5. If so, call GetPartitions with NextToken parameter
  6. If not, end the state machine

This is completely possible in Cloudformation, states language, and Workflow Studio. However, the SDK seems to make this impossible. I cannot put the same State into the workflow twice since it validates that the state ID is unique. I also cannot set the Next Step on the subsequent GetPartitions call since the Task class doesn't allow it. I tried to override it manually but that seemed to break the validator as it threw a "missing state ID" type error.

Use Case

This feature is necessary to a lot of stepfunctions-native development. One of the best parts of stepfunctions is integrations and many of them are useless without this. Essentially, without this, the SDK is limited to acyclic graphs only which is a major loss.

Proposed Solution

There are multiple solutions.

  1. Task class could support a next step
  2. Chain class could interpret duplicate states as "next states"
  3. Drop-in states language in general could be more well-supported
  4. Overriding fields like next_step on Task instances could just work instead of raising an inaccurate exception

This is a :rocket: Feature Request

cshumaker-irb commented 2 years ago

My apologies, I've found the "next()" function on the State class which should do what I am describing.