aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 321 forks source link

[ECS] Full support for Capacity Providers in CloudFormation. #631

Open coultn opened 4 years ago

coultn commented 4 years ago

CloudFormation does not currently have support for capacity providers in any of the ECS resource types. We will be adding this support in the near future.

lawrencepit commented 4 years ago

Related to this, in order to support capacity providers with managedTerminationProtection, we also need to be able to set the new-instances-protected-from-scale-in property when creating the ASG via CloudFormation. This latter property was added 4 years ago to the AWS SDK / AWS CLI, but is still not supported in CF -- hopefully full support for CP in CF is added a bit faster.

geof2001 commented 4 years ago

Has there been any progress made on this?

Add support for Capacity providers #1

coultn commented 4 years ago

We are working on it and will provide updates as soon as more information is available.

psuj commented 4 years ago

Related to this, in order to support capacity providers with managedTerminationProtection, we also need to be able to set the new-instances-protected-from-scale-in property when creating the ASG via CloudFormation. This latter property was added 4 years ago to the AWS SDK / AWS CLI, but is still not supported in CF -- hopefully full support for CP in CF is added a bit faster.

Additionally, when the new-instances-protected-from-scale-in property is set on ASG, scheduled action to scale-in instances could not be executed. Feature like force-scale-in for scheduled actions would be useful if for example we have dev env and we would like to turn off instances for night and turn them back on in the morning.

pparth commented 4 years ago

+1

tobymiller commented 4 years ago

When this is implemented, will it be possible to do a rolling update to the launch template under autoscaling and a change to a service in ecs, such that the new tasks run on instances from the new launch template while the old ones stay on the old instances as they roll over?

I'm struggling to achieve this with custom resources at the moment, partly as the dependencies are all in funny directions. Would be great to have it all defined declaratively in cfn.

sopel commented 4 years ago

Cross-linking the resp. request in https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap/issues/301

RomanCRS commented 4 years ago

Any ETA on this?

pauldraper commented 4 years ago

Does this depend on #632?

RomanCRS commented 4 years ago

Does this depend on #632?

I think no.

andreaswittig commented 4 years ago

Sadly, that's the reason why using CloudFormation is becoming more and more frustrating.

gabegorelick commented 4 years ago

FWIW, Terraform has supported this since shortly after the API was released: https://github.com/terraform-providers/terraform-provider-aws/pull/11151

Of course, it can't delete capacity providers since there's no API: https://www.terraform.io/docs/providers/aws/r/ecs_capacity_provider.html

RomanCRS commented 4 years ago

I don't want to use, rely on and support third-party software if I have a chance to use the official product.

Vince-Cercury commented 4 years ago

any update?

XBeg9 commented 4 years ago

same here, any updates?

ronan-cunningham commented 4 years ago

any update?

darrenweiner commented 4 years ago

the lack of Cfn support for this 6 months in is really disappointing. This puts the burden on anyone building CI/CD using Cfn to add additional and silly custom cli/sdk pieces to actually tie in capacity providers, which then have to be ripped out once the support that should be part of a point release is in place. You can do better. Communicating timeframes would help as well.

andreaswittig commented 4 years ago

Have you had a deeper look into Capacity Providers and Cluster Auto Scaling? Does not match with my requirements at all. Does not scale down properly. Does not work with CloudFormation rolling updates for the ASG. So missing CloudFormation support is not the only problem here. :)

coultn commented 4 years ago

Have you had a deeper look into Capacity Providers and Cluster Auto Scaling? Does not match with my requirements at all. Does not scale down properly. Does not work with CloudFormation rolling updates for the ASG. So missing CloudFormation support is not the only problem here. :)

Thanks for the feedback - can you explain more what you mean by "does not scale down properly"?

darrenweiner commented 4 years ago

coultn: Here's what I think is a common use case: A CI/CD pipeline where services are spun up on an ASG backed EC2 cluster.
Services do not pre-exists, the CI/CD creates them. Currently, you can not use cfn to create a capacity provider enabled service. If the underlying cluster doesn't have the memory or cpu, I would expect that when a new service is deployed, it would add another ec2 and deploy the new service..but there's no way to do that currently. I suppose what might work right now is: Deploy the service with no capacity provider, perhaps with a quantity of 0 so it stabilizes, then via the cli, update the service to use a capacity provider, then another cli call to increase the quantity to 1....but that seems like hoop jumps. With regards to down scaling, in reading the documentation, it seems a bit unclear on exactly how this is meant to work: If the goal is to optimize resources, I would actually want the cp to be intelligent enough to a) determine that the cluster is currently overprovisioned and b) if so, drain EC2 accordingly and have the ASG terminate the drained instance...all with standard, appropriate cooldown periods, etc.

coultn commented 4 years ago

Currently, you can not use cfn to create a capacity provider enabled service.

Thanks for the feedback! We are working on full support for capacity providers in CloudFormation, and we definitely understand the need for that. However, I do want to point out that you can actually create a capacity-provider enabled service in CloudFormation today. You can accomplish this by first configuring a default capacity provider strategy for the cluster. This default capacity provider strategy will be used by any service you create that does not specify a launch type. Next, when you create your service in CloudFormation, do not include the LaunchType parameter. The service will use the capacity provider strategy defined by the cluster, and will auto-scale from zero instances if necessary.

With regards to down scaling, in reading the documentation, it seems a bit unclear on exactly how this is meant to work: If the goal is to optimize resources, I would actually want the cp to be intelligent enough to a) determine that the cluster is currently overprovisioned and b) if so, drain EC2 accordingly and have the ASG terminate the drained instance...all with standard, appropriate cooldown periods, etc.

Understood. In the first version of ECS cluster auto scaling, we took a more conservative route where instances would not scale in unless no tasks are running on them. We are looking at the idea of automating an "instance drainer" that will automatically find underutilized instances and set them to draining. With ECS cluster auto scaling, those instances would automatically shut down once no tasks are running on them. It's possible to do this already today, but you would need to implement your own Lambda function (or similar) to do the evaluation of the instance and call the ECS API to set the instance to the DRAINING state.

darrenweiner commented 4 years ago

Really awesome feedback, thank you. As far as the workaround for setting it at Cluster creation, I'll take a look at that..easy enough to implement for QA/Dev..a little trickier for existing prod environments.

Trying to avoid custom tooling since...this seems sooo close to being a solid solution.

Any timing on better cfn support? I know that's a different, probably very overwhelmed team, but would be nice to see some improvements here. ECS rocks, and once this is dialed in, it's going to really round out the offering.

Will keep checking for ECS updates!

marcelmunarolo commented 4 years ago

Dear colleagues, Please, in CF, provide the opportunity of fine tune some Capacity Provider auto generated parameters. Currently, in addition the the current parameters, we need the adjust the Cooldown in the Auto Scaling Plan manually, as well the Alarms datapoints, all after the Capacity Provider creation. It would be great put all this together in the CF script. This is a must for us. Thank you very much!

coultn commented 4 years ago

Regarding timeline - we can't share specific timelines but we will share updates here as soon as they are available.

darrenweiner commented 4 years ago

coultn: Because this is such a useful feature for so many of my clients, I decided to re-tool things today.
Unfortunately, capacity providers still doesn't seem to work. The cluster default cp is in place. I re-created services without the LaunchTemplate reference, and it clearly shows the services are using the capacity provider strategy. However, when I deploy services and exhaust the memory, it throws the usual message saying it can't find a container with the resources. Interestingly, and probably to the point: The cloudwatch metric for the cp that is assigned to this cluster (CapacityProviderReservation) isn't reporting any metrics at all. I have seen this metric chart more appropriately in previous tests a few weeks ago with another client...no idea why it's not reporting anything. I spun up about 5-8 services today on this cluster using the cp strategy.... I'll just keep checking back for updates...hopefully some good changes coming soon.

rcrelia commented 4 years ago

+1

robertd commented 4 years ago

This is definitely a showstopper for our CDK-powered automation workflows. Setting Capacity Provider on a cluster level is something CloudFormation team is looking into. https://github.com/aws-cloudformation/aws-cloudformation-coverage-roadmap/issues/301

In the meantime our workaround is to run following aws-cli command in our ci/cd workflow:

aws ecs put-cluster-capacity-providers \
    --cluster CLUSTER_NAME \ 
    --capacity-providers FARGATE \ 
    --default-capacity-provider-strategy capacityProvider=FARGATE

I really hope this ships soon. 🤞

jakebanks commented 4 years ago

+2

hatappo commented 4 years ago

Deletion is now supported by the API. Will this accelerate the implementation of this feature addition?

https://aws.amazon.com/jp/about-aws/whats-new/2020/06/amazon-ecs-capacity-providers-support-delete-functionality/

pramshar commented 4 years ago

+1

mwarkentin commented 4 years ago

Saw this earlier today, but the resources don’t seem to have been updated yet: https://twitter.com/aws_doc/status/1273943424849383424?s=21

guillaumesmo commented 4 years ago

I have implemented the new CloudFormation resources in one of my stacks and can confirm it works 👍

there's still a missing link though which might be (part of) the reason why it was not announced yet:

AWS::ECS::CapacityProvider AutoScalingGroupProvider requires the parameter AutoScalingGroupArn which accepts only an ARN (which contains a UUID part so you cannot "guess" it).

Unfortunately AWS::AutoScaling::AutoScalingGroup does not expose its ARN so there's no way to reference this in the AutoScalingGroupProvider for now.

Either hardcoding an existing ARN or, once more, hacking around with a Custom Resource to get the ARN works.

akdor1154 commented 4 years ago

ah AWS, where just the C is an acceptable MVP for CRUD. oh well glad it's finally getting released.

rhlarora84 commented 4 years ago

I have implemented the new CloudFormation resources in one of my stacks and can confirm it works

there's still a missing link though which might be (part of) the reason why it was not announced yet:

AWS::ECS::CapacityProvider AutoScalingGroupProvider requires the parameter AutoScalingGroupArn which accepts only an ARN (which contains a UUID part so you cannot "guess" it).

Unfortunately AWS::AutoScaling::AutoScalingGroup does not expose its ARN so there's no way to reference this in the AutoScalingGroupProvider for now.

Either hardcoding an existing ARN or, once more, hacking around with a Custom Resource to get the ARN works.

What about the Termination protection on Autoscaling and managed termination on CapacityProvider? I believe Autoscaling resource needs to be updated to support that.

darrenweiner commented 4 years ago

A typical scenario, of having a template with an ASG and an capacity provider defined in the same template (which rhlarora84 alluded to) is not possible because the AWS::AutoScaling::AutoScalingGroup resource only returns the name..but capacity provider reqiures an Arn...That's kind of a miss on the ASG resource as well (why does it not have an Arn attribute...?).
At the least, it would be nice if the capacity provider can specify the name or arn as an option. A number of other resources support that.

manokaran3529 commented 4 years ago

@coultn Hello, Is there a way or how are we going to cover the need of doing the ASG rolling update for AMI refresh or something that sort with having the capacity provider with managed termination. At present the pack (ECS Cluster, Capacity provider, ASG, Cloud formation) does not support the rolling update since the termination protection of ASG should be on for the managed termination of CP to work so we are sacrificing the managed termination of CP over the rolling update for now. It would be great if it can accommodate all these.

jakebanks commented 4 years ago

@manokaran3529 be careful, on scale down we saw container instances being terminated with managed termination protection off, when there was a better choice available (instance not running any container). You raise a good point regarding the rolling update though, I'm intending on using that and haven't tested yet...

pramshar commented 4 years ago

How to manage circular dependency ?

ECS Cluster needs Capacity Provider Capacity Provider needs ASG (because of Arn)

When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.

Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.

anoopkapoor commented 4 years ago

Hello! We are actively working on a few things to provide more comprehensive capacity provider support in CloudFormation.

  1. Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
  2. Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
  3. Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource
shaybbigid commented 4 years ago

Hello! We are actively working on a few things to provide more comprehensive capacity provider support in CloudFormation.

  1. Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource
  2. Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource
  3. Ability to enable scale-in protection in the AWS::AutoScaling::AutoScalingGroup resource

ETA?

manokaran3529 commented 4 years ago

@manokaran3529 be careful, on scale down we saw container instances being terminated with managed termination protection off, when there was a better choice available (instance not running any container). You raise a good point regarding the rolling update though, I'm intending on using that and haven't tested yet...

Yes, it terminated an instance which had most of the tasks. As a hack, we changed the termination policy of the ASG to 'Newest' so while termination it picked the newest one where we had only the scaled up tasks.

mb-dev commented 4 years ago

Capacity Provider for Cloudformation is now available: https://d201a2mn26r7lk.cloudfront.net/latest/gzip/CloudFormationResourceSpecification.json

(or more friendly changelog: https://github.com/aws/aws-cdk/commit/4ce27f4195c70bd9e365ec0e0df5c0ede863bc8a)

pramshar commented 4 years ago

Capacity Provider for Cloudformation is now available: https://d201a2mn26r7lk.cloudfront.net/latest/gzip/CloudFormationResourceSpecification.json

(or more friendly changelog: aws/aws-cdk@4ce27f4)

What does this mean ? This is old news here, looks same thing to me , still everyday look around here for fixes are done or not.

mb-dev commented 4 years ago

Sorry, i missed that this was released 12 days ago. Will wait for the fixes above.

taylorb-syd commented 4 years ago

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.

Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?

Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

guillaumesmo commented 4 years ago

Indeed, documentation has been updated to "The Amazon Resource Name (ARN) or short name that identifies the Auto Scaling group."

anoopkapoor commented 4 years ago

Hi All, confirming that

  1. Ability to reference the ASG name in the AWS::ECS::CapacityProvider resource is now available in all regions.
rhlarora84 commented 4 years ago

@anoopkapoor any eta on 3? Scale in protection on Autoscaling.

belangovan commented 4 years ago

@anoopkapoor any eta on 2)Ability to specify a custom capacity provider strategy in the AWS::ECS::Service resource?

pramshar commented 4 years ago

I was doing some testing today, and I noticed that I could pass the AutoScalingGroup name to as the autoScalingGroupArn in the CreateCapacityProvider API call when previously it would error out.

Armed with this knowledge I tried this:

  AutoScalingGroup:
    Type: AWS::AutoScaling::AutoScalingGroup
    Properties:
      DesiredCapacity: 0
      LaunchTemplate:
        LaunchTemplateId: !Ref LaunchTemplate
        Version: !GetAtt LaunchTemplate.LatestVersionNumber
      MaxSize: 2
      MinSize: 0
      VPCZoneIdentifier:
        - !Ref SubnetId

  CapacityProvider:
    Type: AWS::ECS::CapacityProvider
    Properties:
      AutoScalingGroupProvider:
        AutoScalingGroupArn: !Ref AutoScalingGroup
        ManagedScaling:
          Status: DISABLED
        ManagedTerminationProtection: DISABLED

And it worked! I only tested this in the ap-southeast-2 region. So I assume the reason this change wasn't announced is because it isn't live everywhere yet?

Good news for everyone tracking this issue through. I'll wait for this to be confirmed here before I use this in production, but it saves me from using a rather ugly custom resource to extract the ARN like I was planning to do.

How do you manage circular dependency still ?

ECS Cluster needs Capacity Provider Capacity Provider needs ASG (because of Ref)

When you delete ECS Cluster will get deleted first and fail because it ASG is still alive.

Error occurred during operation 'DeleteClusters SDK Error: The Cluster cannot be deleted while Container Instances are active or draining. (Service: Ecs, Status Code: 400, Request ID: 5751e46b-d3d4-4f0c-ad2f-ca7e072184c7, Extended Request ID: null)'.