PrefectHQ / prefect

Prefect is a workflow orchestration framework for building resilient data pipelines in Python.
https://prefect.io
Apache License 2.0
17.55k stars 1.65k forks source link

Too Many Concurrent Attempts to Register Task Definition in ECS Work Pool #15865

Open torbiczuk opened 3 weeks ago

torbiczuk commented 3 weeks ago

Bug summary

When multiple Prefect flows start simultaneously using an ECS work pool, they attempt to register a new AWS task definition at the same time. This leads to a ClientException with the message: "Too many concurrent attempts to create a new revision of the specified family."

I have 5 scheduled flows that start simultaneously each day. These flows attempt to register new task definition each, likely because a new deployment version is created each night.

I've tried to set these variables in my worker Dockerfile: ENV AWS_RETRY_MODE=adaptive ENV AWS_MAX_ATTEMPTS=100 however, this didn't resolve the issue.

I'm using the same Docker image for each flow with only input parameter differing(I have enabled Match Latest Revision In Family (Optional) but it's not working)

Version info

Version:             2.20.2
API version:         0.8.4
Python version:      3.11.9
Git commit:          51c3f290
Built:               Wed, Aug 14, 2024 11:27 AM
OS/Arch:             darwin/arm64
Profile:             default
Server type:         server

Additional context

I'm using prefect-aws: 0.4.2

Task Definitions Comparison:

Attempts to Mitigate:

Related Issue: PrefectHQ/prefect#10102

Environment Variables Set: dockerfile ENV AWS_RETRY_MODE=adaptive ENV AWS_MAX_ATTEMPTS=100

Task definitions created for the same deployment (differenf flows run): task_definition_1.json task_definition_2.json

sys-git commented 3 weeks ago

+1

I'm also facing a similar issue.