aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.62k stars 3.91k forks source link

(stepfunctions-tasks): Allow specifying CapacityProviderStrategy in EcsRunTask #20013

Open philiptromans opened 2 years ago

philiptromans commented 2 years ago

Describe the feature

The ECS RunTask API allows you to specify a capacityProviderStrategy to use to run the task. This field can also be specified in a ecs:runTask.sync Step Function state. It'd be good if an aws_stepfunctions_tasks.EcsRunTask construct allowed this to be specified as well.

Use Case

I have a Step Function where some ECS tasks are too large to run in Fargate. Instead, I run them in an EC2-backed ECS cluster. I'd like this to happen in a cluster that scales up from zero instances, on demand. Without specifying the CapacityProvider in the Step Function state, the task will not start due to there not being any available containers (the cluster has a capacity of zero at this point). I am not able to do this idiomatically in CDK.

Proposed Solution

This situation can be worked around by creating a CustomState, although this can be tricky, because suitable IAM policies etc need to be created manually, and it generally feels clunky.

Alternatively, either #7967 or #15230 would provide a suitable workaround.

Other Information

Related issues: #7967 and #15230

Acknowledgements

CDK version used

2.20.0

Environment details (OS name and version, etc.)

macOS 12.3

kaizencc commented 2 years ago

Leaving this up as a p2 to see community interest on it first. Sounds like a simple add to the existing task, but usually nothing is simple in stepfunctions-tasks.

trobert2 commented 2 years ago

@kaizencc, at the moment it's not possible to run tasks in clusters without default capacity providers. is there any chance to move this forward?

olivier-schmitt-sonarsource commented 2 years ago

It seems that without this feature it's not possible to use FARGATE_SPOT capacity provider: it has huge implications from a cost perspective and this is not customer friendly.

I'm actually trying to use SPOT to run a batch solution and this feature would be a must-have.

kevinbader commented 1 year ago

15230 is closed but it doesn't seem to be respected when using EcsRunTask - for me, autoscaling works when the task is scheduled using the Console but not when scheduled as part of a StepFunction statemachine. Comparing the CloudTrail events, I see that the Console passes the capacityProviderStrategy to runTask whereas the StepFunction's EcsRunTask does not. I wonder if that's the culprit, but I can't be sure. Anyone found a workaround to this?