aws-samples / aws-stepfunctions-examples

AWS Step Functions is an orchestration service for reliably executing multi-step processes using visual workflows. This repository includes detailed examples that will help you unlock the power of serverless workflow.
MIT No Attribution
226 stars 86 forks source link

Why does AWS set the maximum degree of parallelism at 40? #4

Closed hjb417 closed 2 years ago

hjb417 commented 3 years ago

Hi, I was experimenting with step functions for an ETL process that's relies heavily on parallelization and I couldn't break 37 concurrent executions. I see in your example, (and thank you for providing it!!!), that you work around this by recursively calling state machines to ensure the MAP state never receives more than 40 items. I don't want to go down that route because I was hoping encapsulate all the steps into a single workflow/statemachine as well as keep the state machine flow logic simple (E.x.: no choices... recursion to avoid throttling by AWS and adding conditional/branching logic in my code as well as the state machine).

My apologies if this is not the appropriate venue to ask this

JustinCallison commented 2 years ago

@hjb417 sorry for the late response. The reasons we cap concurrency for running branches in a Map state are complicated and relate to the internals of the service that I can't really go into here. Reliable and scalable at-most-once-execution in a distributed system turns out to be a pretty complex problem. We try to remove or help you avoid as much of that as possible. That said, we hear you and, while I can't promise anything, it is something we are thinking deeply about.

Given this, I put together the example and the associated blog post (linked below) to help folks accomplish pretty much any scale they can imagine with the service as it is. One thing to keep in mind is that, while decomposing into multiple workflows adds some complexity, it has other benefits as well like working within the max history events per execution limit and making it more manageable to observe and analyze by breaking things into smaller chunks. Depending on what you are doing in those parallel branches created by your Map state, that can be helpful on its own.

https://aws.amazon.com/blogs/compute/accelerating-workloads-using-parallelism-in-aws-step-functions/