aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.68k stars 3.93k forks source link

Unable to use ECS Ec2Service construct when ECS cluster uses manually added EC2 capacity #5214

Closed cbeattie-tl closed 1 year ago

cbeattie-tl commented 4 years ago

The Ec2Service class in the @aws-cdk/aws-ecs module has a check in its validate() method to verify that the cluster has capacity. While this seems like a reasonable check in many situations it assume that the user is using AWS CDK constructs to create ECS EC2 instances. However, if one where to manually create an ECS instance in a cluster that doesn't use EC2 capacity this check would prevent the user from using the Ec2Service construct and there is no way around it (except for the ugly workaround mentioned below).

An example use-case is an ECS cluster that is 100% Fargate (i.e. no EC2 capacity required). Then a need arises when you need a very specific use-case that Fargate does not support and thus require an EC2 ECS instance. Further, the EC2 instance is heavily customized in a way that is not supported from cluster.addCapacity() and requires the ECS instance to be created outside of addCapacty(). In this situation, the ECS cluster construct does not know the capacity you've manually added.

My recommendation is to remove this hasCapacity check in Ec2Service as I don't think it really services a useful purpose. At best it can catch a situation where someone accidentally forgot to add capacity but it cannot actually usefully tell you if you have enough capacity to actually run your service. What if your service requires 4xCPU but you only have enough capacity for 2xCPU. There is a myriad of cases where you can have enough capacity but ECS still won't be able to run your service (another example would be placement constraints).

Reproduction Steps

  1. Create an ECS Cluster with a single Fargate service.
  2. Create an ECS instance manually (not using addCapacity())
  3. Create an ECS service using the Ec2Service construct

Error Log

Error: Validation failed with the following errors: [path/to/myservice] Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster.

Environment

Other


This is :bug: Bug Report

cbeattie-tl commented 4 years ago

Code in question can be found here: https://github.com/aws/aws-cdk/blob/4f6948c1ca5199592fc962c8305ad1c9fc806349/packages/%40aws-cdk/aws-ecs/lib/ec2/ec2-service.ts#L205

piradeepk commented 4 years ago

@cbeattie-tl just wanted to clarify, are you trying to create an Ec2Service with a cluster that already exists in your account?

If the above is the case, you can actually import your cluster from your account using fromClusterAttributes:

https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-ecs/lib/cluster.ts#L52-L54

cbeattie-tl commented 4 years ago

@cbeattie-tl just wanted to clarify, are you trying to create an Ec2Service with a cluster that already exists in your account?

If the above is the case, you can actually import your cluster from your account using fromClusterAttributes:

https://github.com/aws/aws-cdk/blob/master/packages/%40aws-cdk/aws-ecs/lib/cluster.ts#L52-L54

I did notice if you use the Cluster.import() functionality you can artificially set the hasCapacity property. Another solution is to extend Ec2Service and override the validate() method.

However in my case, the ECS cluster is actually created in the same Stack so the natural thing to do would be to just use the reference to the cluster as I created it (as opposed to working around this capacity issue with an artificial/unnecessary cluster import statement).

I hope this makes sense.

hencrice commented 4 years ago

Hey @cbeattie-tl

Thanks for the response. After some discussions, we deem the use case (i.e. support adding manually-provisioned instance) is not one of the intended use cases for CDK, which meant to create reproducible infrastructure. Please let us know if you have other questions.

cbeattie-tl commented 4 years ago

Hey @cbeattie-tl

Thanks for the response. After some discussions, we deem the use case (i.e. support adding manually-provisioned instance) is not one of the intended use cases for CDK, which meant to create reproducible infrastructure. Please let us know if you have other questions.

@hencrice What you are really saying is that AWS CDK will not support the use of the L2 Ec2Service construct if the ECS cluster is created in CDK but the "capacity" is provided outside of CDK (outside of the same stack actually). I believe this unnecessarily limiting.

not one of the intended use cases for CDK, which meant to create reproducible infrastructure

You'll get no disagreement from me. The whole notion of this use case not being about reproducible infrastructure is moot. This use-case I am presenting here has nothing to do with infrastructure being "not reproducible". In fact, in my scenario the EC2 instance I created was actually created within CDK itself!!! Its all CDK - it's all CloudFormation. The situation at hand arises when you need to create EC2 VMs that cannot be created as part of an auto-scaling group. In my use-case, I needed to get the IP addresses of 3 nodes that form a quorum for bootstrapping membership. I needed to get the actual IP addresses of these nodes but I could no pre-assign them because I am running more than one "environment" in the same VPC so the IP's may conflict. I need "random" IPs but I need the private IP address to bootstrap into other CDK assets. I'm thrilled if you can indicate a way of doing this an autoscaling group but without writing a bunch of custom resources an easier approach was to just create 3 EC2 instances "manually" (in my CDK stack). Without addressing this defect (as I see it), we are unable to use the Ec2Service construct and I had to write a very hacky work-around to overcome this short-coming.

By way of recommendation, perhaps CDK could support a way of addition "capacity" to an ECS cluster that is provided by creating EC2 instances instead of auto-scaling groups. For example, what if ecs.Cluster supported a method like cluster.addEc2Instance() (in contrast to cluster.addAutoScalingGroup()).

Not attempting to belabor the point, but the whole rationale behind this hasCapacity validation is misguided. You might "have capacity" in that you created an autoscaling group but you may be over subscribed to it and your services won't fully start. This check doesn't really prevent you from anything other than maybe accidentally forgetting to add some capacity to your cluster. Another situation is that you "have capacity" but you have a placement constraint that prevents your services from running on those nodes. At best it should be a warning.

hencrice commented 4 years ago

@cbeattie-tl Thanks for the detailed use case and providing your recommendation. This is awesome! Let me bring it back to the team to reevaluate.

akuma12 commented 4 years ago

I have another use case for bypassing the capacity check of the Ec2Service construct (and by extension the ECS EC2 patterns). We're using Spot Fleet and Application AutoScaling Policies to handle instance capacity, so we're not using AutoScaling groups. It doesn't appear that addCapacity() supports this method of capacity management, so I'm unable to use our cluster with the Ec2Service construct. I get the same error that @cbeattie-tl mentioned in his issue: Cluster for this service needs Ec2 capacity. Call addXxxCapacity() on the cluster.

yuanchieh-cheng commented 3 years ago

Same issue here.
It would super useful if we could add ec2 instances by cluster.addCapacity().
For me, I would like to assign ec2 role but it is not support in addCapacity function now.
There are bunch of ec2 instance properties not supported now.
I think it would be more flexible to create ec2 instances and bind them to the cluster.

And in the console we can create empty cluster, it would be better to allow in cdk.

rachedanis commented 3 years ago

As @akuma12 mentionned, we also have the same use case ( EC2 instances handled by spotFleet request ).

We have developped a component to autoscale the spotFleet's capacity ( intially at 0 ). This component is draining ECS instances in the scaleOut operation. In addition to that, we had to add an ECS daemon on each instance to auto-terminate it, and that's why we need to setup this daemon EC2 service in an empty cluster.

In addition to that, we're using Java, so currently we had to disable all validation to just bypass the hasCapacity check.

RodrigoAS28 commented 2 years ago

I had a similar issue. In my case the EC2 instances were actually created via CDK, but since they are not in an ASG that can be added as capacity, I would get the same error. Found a workaround though: Just created the capacity 🙂

  const cluster = new ecs.Cluster(this, 'cluster', {
    clusterName: 'cluster',
    vpc: vpc,
    containerInsights: true,
    capacity: {
      autoScalingGroupName: "dummy",
      instanceType: new ec2.InstanceType("t2.nano"),
      maxCapacity: 0,
      minCapacity: 0,
    }
  })

Not beautiful, but it worked for me. CDK passed and the service runs on the registered EC2 instances.

madeline-k commented 1 year ago

There seems to be two separate requests in this issue:

  1. Remove the hasCapacity check so that you can create a service using a Cluster created in your CDK app, but with capacity manually added to that cluster using another method (i.e. console).
  2. Add an addEc2Instance() method to support adding capacity to a cluster with just one EC2 instance.

Regarding 1, we will not be able to support use cases that mix together CDK management and external management of resources. If you need to modify resources outside of CDK, you should both create and manage them outside of CDK, and then reference these resources with the fromClusterArn method. Maybe it would make sense to open a new feature request requesting a different way of managing capacity in CDK?

Regarding 2, ECS Clusters only support adding capacity via autoscaling groups. So the recommended approach for that use -case would be to create and autoscaling group containing just the EC2 instance in question, then use the group as the capacity.

I'm closing this issue, since we don't plan to implement either of these requests. To continue to discuss this topic, please open a new issue, or a discussion.

github-actions[bot] commented 1 year ago

⚠️COMMENT VISIBILITY WARNING⚠️

Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.

cbeattie-tl commented 1 year ago

To all who may follow, I think @RodrigoAS28's solution here is the best possible work-around since the gate keepers don't seem to agree that there are valid use-cases to attach EC2 instances an ECS cluster where said EC2 instances cannot be created by ASGs (for any number of reasons).

andrestone commented 1 year ago

I thought I'd just drop this here in case anyone is researching on how to provide capacity for ECS Clusters.

This statement is inaccurate:

Regarding 2, ECS Clusters only support adding capacity via autoscaling groups. So the recommended approach for that use -case would be to create and autoscaling group containing just the EC2 instance in question, then use the group as the capacity.

As pointed out in other comments above, it's true that CDK doesn't allow you to inform / provision other capacity sources for ECS Clusters, but ECS does support adding capacity to clusters without using autoscaling groups, as officially documented here. And there are several use cases for it (as seen in the comments).

Hopefully this helps convincing the team to move towards this direction.

@TheRealAmazonKendra

aecollver commented 1 year ago

Alternative workaround:

const cluster = new (class extends Cluster { get hasEc2Capacity(): boolean { return true; } })(this, "Cluster", { vpc, });

altonotch commented 6 months ago

const cluster = new (class extends Cluster { get hasEc2Capacity(): boolean { return true; } })(this, "Cluster", { vpc, });

Same idea but for existing cluster:

class FakeCluster extends ecs.Cluster { get hasEc2Capacity(): boolean { return true; } }

// somewhere in code
  const fakeCluster = FakeCluster.fromClusterAttributes(this, "Cluster", {clusterName: clusterName, vpc: vpc});