Open coultn opened 4 years ago
Related to this, in order to support capacity providers with managedTerminationProtection
, we also need to be able to set the new-instances-protected-from-scale-in property when creating the ASG via CloudFormation. This latter property was added 4 years ago to the AWS SDK / AWS CLI, but is still not supported in CF -- hopefully full support for CP in CF is added a bit faster.
Has there been any progress made on this?
Add support for Capacity providers #1
We are working on it and will provide updates as soon as more information is available.
Related to this, in order to support capacity providers with
managedTerminationProtection
, we also need to be able to set the new-instances-protected-from-scale-in property when creating the ASG via CloudFormation. This latter property was added 4 years ago to the AWS SDK / AWS CLI, but is still not supported in CF -- hopefully full support for CP in CF is added a bit faster.
Additionally, when the new-instances-protected-from-scale-in
property is set on ASG, scheduled action to scale-in instances could not be executed. Feature like force-scale-in
for scheduled actions would be useful if for example we have dev env and we would like to turn off instances for night and turn them back on in the morning.
+1
When this is implemented, will it be possible to do a rolling update to the launch template under autoscaling and a change to a service in ecs, such that the new tasks run on instances from the new launch template while the old ones stay on the old instances as they roll over?
I'm struggling to achieve this with custom resources at the moment, partly as the dependencies are all in funny directions. Would be great to have it all defined declaratively in cfn.
Cross-linking the resp. request in https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap/issues/301
Any ETA on this?
Does this depend on #632?
Does this depend on #632?
I think no.
Sadly, that's the reason why using CloudFormation is becoming more and more frustrating.
FWIW, Terraform has supported this since shortly after the API was released: https://github.com/terraform-providers/terraform-provider-aws/pull/11151
Of course, it can't delete capacity providers since there's no API: https://www.terraform.io/docs/providers/aws/r/ecs_capacity_provider.html
I don't want to use, rely on and support third-party software if I have a chance to use the official product.
any update?
same here, any updates?
any update?
the lack of Cfn support for this 6 months in is really disappointing. This puts the burden on anyone building CI/CD using Cfn to add additional and silly custom cli/sdk pieces to actually tie in capacity providers, which then have to be ripped out once the support that should be part of a point release is in place. You can do better. Communicating timeframes would help as well.
Have you had a deeper look into Capacity Providers and Cluster Auto Scaling? Does not match with my requirements at all. Does not scale down properly. Does not work with CloudFormation rolling updates for the ASG. So missing CloudFormation support is not the only problem here. :)
Have you had a deeper look into Capacity Providers and Cluster Auto Scaling? Does not match with my requirements at all. Does not scale down properly. Does not work with CloudFormation rolling updates for the ASG. So missing CloudFormation support is not the only problem here. :)
Thanks for the feedback - can you explain more what you mean by "does not scale down properly"?
coultn: Here's what I think is a common use case: A CI/CD pipeline where services are spun up on an ASG backed EC2 cluster.
Services do not pre-exists, the CI/CD creates them.
Currently, you can not use cfn to create a capacity provider enabled service.
If the underlying cluster doesn't have the memory or cpu, I would expect that when a new service is deployed, it would add another ec2 and deploy the new service..but there's no way to do that currently. I suppose what might work right now is: Deploy the service with no capacity provider, perhaps with a quantity of 0 so it stabilizes, then via the cli, update the service to use a capacity provider, then another cli call to increase the quantity to 1....but that seems like hoop jumps.
With regards to down scaling, in reading the documentation, it seems a bit unclear on exactly how this is meant to work: If the goal is to optimize resources, I would actually want the cp to be intelligent enough to a) determine that the cluster is currently overprovisioned and b) if so, drain EC2 accordingly and have the ASG terminate the drained instance...all with standard, appropriate cooldown periods, etc.
Currently, you can not use cfn to create a capacity provider enabled service.
Thanks for the feedback! We are working on full support for capacity providers in CloudFormation, and we definitely understand the need for that. However, I do want to point out that you can actually create a capacity-provider enabled service in CloudFormation today. You can accomplish this by first configuring a default capacity provider strategy for the cluster. This default capacity provider strategy will be used by any service you create that does not specify a launch type. Next, when you create your service in CloudFormation, do not include the LaunchType parameter. The service will use the capacity provider strategy defined by the cluster, and will auto-scale from zero instances if necessary.
With regards to down scaling, in reading the documentation, it seems a bit unclear on exactly how this is meant to work: If the goal is to optimize resources, I would actually want the cp to be intelligent enough to a) determine that the cluster is currently overprovisioned and b) if so, drain EC2 accordingly and have the ASG terminate the drained instance...all with standard, appropriate cooldown periods, etc.
Understood. In the first version of ECS cluster auto scaling, we took a more conservative route where instances would not scale in unless no tasks are running on them. We are looking at the idea of automating an "instance drainer" that will automatically find underutilized instances and set them to draining. With ECS cluster auto scaling, those instances would automatically shut down once no tasks are running on them. It's possible to do this already today, but you would need to implement your own Lambda function (or similar) to do the evaluation of the instance and call the ECS API to set the instance to the DRAINING state.
Really awesome feedback, thank you. As far as the workaround for setting it at Cluster creation, I'll take a look at that..easy enough to implement for QA/Dev..a little trickier for existing prod environments.
Trying to avoid custom tooling since...this seems sooo close to being a solid solution.
Any timing on better cfn support? I know that's a different, probably very overwhelmed team, but would be nice to see some improvements here. ECS rocks, and once this is dialed in, it's going to really round out the offering.
Will keep checking for ECS updates!
Dear colleagues, Please, in CF, provide the opportunity of fine tune some Capacity Provider auto generated parameters. Currently, in addition the the current parameters, we need the adjust the Cooldown in the Auto Scaling Plan manually, as well the Alarms datapoints, all after the Capacity Provider creation. It would be great put all this together in the CF script. This is a must for us. Thank you very much!
Regarding timeline - we can't share specific timelines but we will share updates here as soon as they are available.
coultn:
Because this is such a useful feature for so many of my clients, I decided to re-tool things today.
Unfortunately, capacity providers still doesn't seem to work.
The cluster default cp is in place.
I re-created services without the LaunchTemplate reference, and it clearly shows the services are using the capacity provider strategy.
However, when I deploy services and exhaust the memory, it throws the usual message saying it can't find a container with the resources.
Interestingly, and probably to the point: The cloudwatch metric for the cp that is assigned to this cluster (CapacityProviderReservation) isn't reporting any metrics at all.
I have seen this metric chart more appropriately in previous tests a few weeks ago with another client...no idea why it's not reporting anything. I spun up about 5-8 services today on this cluster using the cp strategy....
I'll just keep checking back for updates...hopefully some good changes coming soon.
+1
This is definitely a showstopper for our CDK-powered automation workflows. Setting Capacity Provider on a cluster level is something CloudFormation team is looking into. https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap/issues/301
In the meantime our workaround is to run following aws-cli command in our ci/cd workflow:
aws ecs put-cluster-capacity-providers \
--cluster CLUSTER_NAME \
--capacity-providers FARGATE \
--default-capacity-provider-strategy capacityProvider=FARGATE
I really hope this ships soon. 🤞
+2
Deletion is now supported by the API. Will this accelerate the implementation of this feature addition?
+1
Saw this earlier today, but the resources don’t seem to have been updated yet: https://twitter.com/aws_doc/status/1273943424849383424?s=21
I have implemented the new CloudFormation resources in one of my stacks and can confirm it works 👍
there's still a missing link though which might be (part of) the reason why it was not announced yet:
AWS::ECS::CapacityProvider AutoScalingGroupProvider requires the parameter AutoScalingGroupArn
which accepts only an ARN (which contains a UUID part so you cannot "guess" it).
Unfortunately AWS::AutoScaling::AutoScalingGroup does not expose its ARN so there's no way to reference this in the AutoScalingGroupProvider for now.
Either hardcoding an existing ARN or, once more, hacking around with a Custom Resource to get the ARN works.
ah AWS, where just the C is an acceptable MVP for CRUD. oh well glad it's finally getting released.
I have implemented the new CloudFormation resources in one of my stacks and can confirm it works
there's still a missing link though which might be (part of) the reason why it was not announced yet:
AWS::ECS::CapacityProvider AutoScalingGroupProvider requires the parameter
AutoScalingGroupArn
which accepts only an ARN (which contains a UUID part so you cannot "guess" it).Unfortunately AWS::AutoScaling::AutoScalingGroup does not expose its ARN so there's no way to reference this in the AutoScalingGroupProvider for now.
Either hardcoding an existing ARN or, once more, hacking around with a Custom Resource to get the ARN works.
What about the Termination protection on Autoscaling and managed termination on CapacityProvider? I believe Autoscaling resource needs to be updated to support that.
A typical scenario, of having a template with an ASG and an capacity provider defined in the same template (which rhlarora84 alluded to) is not possible because the AWS::AutoScaling::AutoScalingGroup resource only returns the name..but capacity provider reqiures an Arn...That's kind of a miss on the ASG resource as well (why does it not have an Arn attribute...?).
At the least, it would be nice if the capacity provider can specify the name or arn as an option. A number of other resources support that.
@coultn Hello, Is there a way or how are we going to cover the need of doing the ASG rolling update for AMI refresh or something that sort with having the capacity provider with managed termination. At present the pack (ECS Cluster, Capacity provider, ASG, Cloud formation) does not support the rolling update since the termination protection of ASG should be on for the managed termination of CP to work so we are sacrificing the managed termination of CP over the rolling update for now. It would be great if it can accommodate all these.
@manokaran3529 be careful, on scale down we saw container instances being terminated with managed termination protection off, when there was a better choice available (instance not running any container). You raise a good point regarding the rolling update though, I'm intending on using that and haven't tested yet...
How to manage circular dependency ?
ECS Cluster needs Capacity Provider Capacity Provider needs ASG (because of Arn)
When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.
Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.
Hello! We are actively working on a few things to provide more comprehensive capacity provider support in CloudFormation.
Hello! We are actively working on a few things to provide more comprehensive capacity provider support in CloudFormation.
- Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
- Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
- Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource
ETA?
@manokaran3529 be careful, on scale down we saw container instances being terminated with managed termination protection off, when there was a better choice available (instance not running any container). You raise a good point regarding the rolling update though, I'm intending on using that and haven't tested yet...
Yes, it terminated an instance which had most of the tasks. As a hack, we changed the termination policy of the ASG to 'Newest' so while termination it picked the newest one where we had only the scaled up tasks.
Capacity Provider for Cloudformation is now available: https://d201a2mn26r7lk.cloudfront.net/latest/gzip/CloudFormationResourceSpecification.json
(or more friendly changelog: https://github.com/aws/aws-cdk/commit/4ce27f4195c70bd9e365ec0e0df5c0ede863bc8a)
Capacity Provider for Cloudformation is now available: https://d201a2mn26r7lk.cloudfront.net/latest/gzip/CloudFormationResourceSpecification.json
(or more friendly changelog: aws/aws-cdk@4ce27f4)
What does this mean ? This is old news here, looks same thing to me , still everyday look around here for fixes are done or not.
Sorry, i missed that this was released 12 days ago. Will wait for the fixes above.
I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn
in the CreateCapacityProvider
API call when previously it would error out.
Armed with this knowledge I tried this:
AutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
DesiredCapacity: 0
LaunchTemplate:
LaunchTemplateId: !Ref LaunchTemplate
Version: !GetAtt LaunchTemplate.LatestVersionNumber
MaxSize: 2
MinSize: 0
VPCZoneIdentifier:
- !Ref SubnetId
CapacityProvider:
Type: AWS::ECS::CapacityProvider
Properties:
AutoScalingGroupProvider:
AutoScalingGroupArn: !Ref AutoScalingGroup
ManagedScaling:
Status: DISABLED
ManagedTerminationProtection: DISABLED
And it worked! I only tested this in the ap-southeast-2
region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?
Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.
Indeed, documentation has been updated to "The Amazon Resource Name (ARN) or short name that identifies the Auto Scaling group."
Hi All, confirming that
@anoopkapoor any eta on 3? Scale in protection on Autoscaling.
@anoopkapoor any eta on 2)Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource?
I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the
autoScalingGroupArn
in theCreateCapacityProvider
API call when previously it would error out.Armed with this knowledge I tried this:
AutoScalingGroup: Type: AWS::AutoScaling::AutoScalingGroup Properties: DesiredCapacity: 0 LaunchTemplate: LaunchTemplateId: !Ref LaunchTemplate Version: !GetAtt LaunchTemplate.LatestVersionNumber MaxSize: 2 MinSize: 0 VPCZoneIdentifier: - !Ref SubnetId CapacityProvider: Type: AWS::ECS::CapacityProvider Properties: AutoScalingGroupProvider: AutoScalingGroupArn: !Ref AutoScalingGroup ManagedScaling: Status: DISABLED ManagedTerminationProtection: DISABLED
And it worked! I only tested this in the
ap-southeast-2
region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.
How do you manage circular dependency still ?
ECS Cluster needs Capacity Provider Capacity Provider needs ASG (because of Ref)
When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.
Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.
CloudFormation does not currently have support for capacity providers in any of the ECS resource types. We will be adding this support in the near future.