Open kaaloo opened 1 year ago
Hi @kaaloo thanks for filing the issue. You're right that right now ECS Work pool templates don't support capacityProviderStrategy
. I'm happy to work with you if you'd like to contribute the feature!
Right now we ensure that there is a default launch type no matter what: https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/workers/ecs_worker.py#L1233
We also validate other fields based on the launch type: https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/workers/ecs_worker.py#L854C1-L854C1
It seems like we'll want to make sure launchType
is not specified if capacityProviderStrategy
and maybe some other relevant validation.
Push work pool are a closed source implementation, although they mimic a lot of the logic from the ECSWorker
in this repo. If you're willing to contribute a fix I'm happy to port the implementation to push work pool version.
Expectation / Proposal
I would like to be able to use a Push Work Pool on AWS that can scale to zero because of the cost of idle GPU instances. I have created the following template in the Advanced Tab of the Work Pool.
https://gist.github.com/kaaloo/de723421fb6fda6965fda3d3af5b6dc2
It attempts to replace the
launchType
section in the task run request with acapacityProviderStrategy
section by setting up acapacity_provider
variable which is then used as follows. Thelaunch_type
variable andlaunchType
section in the template have been removed.Unfortunately, the current code base considers the
launchType
section to be mandatory and provides a default value ofFARGATE
which is added back in. This behavior is not compatible with what I'm trying to achieve.https://repost.aws/questions/QUnKdakxvQROuUEjq2UWpa9g/should-ecs-ec2-asgprovider-capacity-provider-be-able-to-scale-up-from-zero-0-1
Traceback / Example
It seems to me that the issue stems somewhere around these lines of code:
https://github.com/PrefectHQ/prefect-aws/blob/main/prefect_aws/workers/ecs_worker.py#L854C1-L854C1
However, I'm not familiar enough with the workings of the new Push Work Pools to get much further in resolving the issue myself. However, with some guidance I could give it a try. Especially if that can help get the fix out faster.