aws-samples / aws-stepfunctions-examples

AWS Step Functions is an orchestration service for reliably executing multi-step processes using visual workflows. This repository includes detailed examples that will help you unlock the power of serverless workflow.
MIT No Attribution
225 stars 83 forks source link

Initial checkin - Step Function cross-execution concurrency control pattern #49

Open junguo opened 1 year ago

junguo commented 1 year ago

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

junguo commented 1 year ago

Description

This example demonstrate the implementation of cross-execution concurrency control for AWS Step Function workflows, by utilizing the listExecutions() API (https://docs.aws.amazon.com/step-functions/latest/apireference/API_ListExecutions.html).

Within a single flow, one can utilize Map or Distributed Map state to control how many concurrent flows can be launched within the same execution. However, there are use cases where one may want to limit the number of concurrent executions of the same workflow, for example, due to downstream API limitation or tasks that requires human intervention.

This is Issue #52

Implementation

Concurrency Controller function:

The concurrency controller Lambda function will check, for a given SFN ARN, the current number of executions using the listExecutions API. It then compares that against a preset concurrency threshold, a static value stored in SSM Parameter Store (for simplicity), and return a “proceed” or “wait” flag

Other considerations

kitsunde commented 8 months ago

This is a check-then-act race condition where multiple execution will check the current execution count, believe it's below a threshold and schedule theirs going above the threshold value.

Jitter doesn't solve race conditions, it will just make them harder to find where 4 concurrent executions in this case will sometimes trigger a race condition depending on where the dice rolls falls. Jitter is really for traffic shaping.

You're better off using DynamoDB to control the concurrency which can do conditional writes, and will solve the issues.