Closed XaaXaaX closed 10 months ago
hello, please can you explain how this workflow helps to increase the parallelism level of distributed map state?
hello, please can you explain how this workflow helps to increase the parallelism level of distributed map state?
@bls20AWS It actually use the Distributed map to increase paralelism level by using Nested Distributed Map,
The Original Distributed Map Reads the S3 whole bucket and dispatches the batch of events ( 300 par example ) into a nested distributed map that will rebatch the 300 batch into 30 of 10 events and treat them in paralel
The patterns represents no advantage for low volume of files but has a significant improvement for a higher volume of objects in s3 bucket ( based on tests using nested 10K of objects was treated in 10 seconds vs not nested in 40 seconds)
Why this pattern? In practical use cases with distributed map and s3 bucket , we consider using of lambda service to increase parallelism at code level by benefiting of Higher memory allocation at function level. this pattern helps to avoid use of unnecessary compute usage and relies of Stepfunctions integration at a maximum.
If you feel necessary i can provide some additional screen shots and benchmarking requests
Issue: #340 Description of changes:
This Pattern helps to increase the parallelism level of distributed map when reading the large s3 buckets and the rapidity of treatment is a desired argument.
Actually to achieve this goal today, we mostly use the lambda to achieve higher parallelism level that actually put a new lambda concurrency constraint on the table to aligne with distribution. this makes often things more complicated, including, long running functions , function Concurrency level, S3 connection handling and etc.
This pattern removes the need of batch and paralelism need at function level , and let achieve that parallelism without a single line of code.