aws-samples / step-functions-workflows-collection

Step Functions Workflows. Learn more at the website: https://serverlessland.com/workflows.
MIT No Attribution
223 stars 120 forks source link

New step-functions-workflows-collection - s3-bucket-nested-ditributed-map #339

Closed XaaXaaX closed 10 months ago

XaaXaaX commented 10 months ago

Issue: #340 Description of changes:

This Pattern helps to increase the parallelism level of distributed map when reading the large s3 buckets and the rapidity of treatment is a desired argument.

Actually to achieve this goal today, we mostly use the lambda to achieve higher parallelism level that actually put a new lambda concurrency constraint on the table to aligne with distribution. this makes often things more complicated, including, long running functions , function Concurrency level, S3 connection handling and etc.

This pattern removes the need of batch and paralelism need at function level , and let achieve that parallelism without a single line of code.

benjasl-stripe commented 10 months ago

hello, please can you explain how this workflow helps to increase the parallelism level of distributed map state?

XaaXaaX commented 10 months ago

hello, please can you explain how this workflow helps to increase the parallelism level of distributed map state?

@bls20AWS It actually use the Distributed map to increase paralelism level by using Nested Distributed Map,

The Original Distributed Map Reads the S3 whole bucket and dispatches the batch of events ( 300 par example ) into a nested distributed map that will rebatch the 300 batch into 30 of 10 events and treat them in paralel

The patterns represents no advantage for low volume of files but has a significant improvement for a higher volume of objects in s3 bucket ( based on tests using nested 10K of objects was treated in 10 seconds vs not nested in 40 seconds)

Why this pattern? In practical use cases with distributed map and s3 bucket , we consider using of lambda service to increase parallelism at code level by benefiting of Higher memory allocation at function level. this pattern helps to avoid use of unnecessary compute usage and relies of Stepfunctions integration at a maximum.

If you feel necessary i can provide some additional screen shots and benchmarking requests