aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.21k stars 316 forks source link

[EKS] [request]: Nodegroup should support tagging ASGs #608

Open bhops opened 4 years ago

bhops commented 4 years ago

Community Note

Tell us about your request It would be great if we could pass tags to the underlying ASGs (and tell the ASGs to propagate tags) that are created from the managed node groups for EKS so that the underlying instances/volumes are tagged appropriately.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Currently, managed node groups for EKS do not support propagating tags to ASGs (and therefore the instances) created by the node group. This leads to EC2 instances that are not tagged according to our requirements for tracking cost, and resource ownership.

morsik commented 3 years ago

@daenney will EKS terraform module be also updated to fix this? https://registry.terraform.io/modules/terraform-aws-modules/eks/aws/latest

daenney commented 3 years ago

@morsik No idea, I don't use that module and am not one of the maintainers. I'd suggest raising that on their issue tracker: https://github.com/terraform-aws-modules/terraform-aws-eks/issues.

chandranathmondal commented 2 years ago

Passing the managed node tags to launch templates "Instance tags" will automatically apply to both EC2 and its volumes. If there are some challenges to do that, creating a separate "Custom Tags" section in the EKS managed node configuration page will also be helpful.

Can you please help identify where the "Custom Tags" section exists?

TBBle commented 2 years ago

You haven't mentioned what tool you're using, if any, but at the EC2 API level, that was probably referring to TagSpecifications in the RequestLaunchTemplateData object used with the CreateLaunchTemplate and CreateLaunchTemplateVersion APIs.

That's what's used for terraform-provider-aws's launch template implementation, and eksctl's managed and unmanaged node group implementations.

pst commented 2 years ago

The issue with aws_autoscaling_group_tag is, in order to be able to loop over the ASGs they have to exist. So one can't write modules to create node groups and tag the auto scaling groups in one go. It will always require a --target apply due to a Terraform limitation.

The only viable approach is to:

  1. Have the node group resource propagate the tags
  2. Have the CA read the tags from node group instead of the auto scaling group

I understand from the thread that option 2 is preferred. But I can't find any PR or issue regarding this in the CA. Is AWS working on this?

stevehipwell commented 2 years ago

@pst I've asked a similar question in https://github.com/aws/containers-roadmap/issues/724.

richardchen-db commented 2 years ago

Why is this issue controversial? Many company need tag on ASG for cost and compliance reason.

Same here. ASG tagging is mandatory in my company for cost and compliance.

newb1e commented 2 years ago

I find it useful to use the launchtemplate configuration for the terraform eks module. This way I'm able to tag the underlying instances. Tagging the instances in my case is more valuable that tagging the ASG it self.

See TF example here.

nanasi880 commented 2 years ago

@newb1e

I'll say it again: I don't want to monitor EC2, I want to monitor the AutoscalingGroup itself. https://github.com/aws/containers-roadmap/issues/608#issuecomment-862017125

eroseland commented 2 years ago

Is this even still being looked at and worked on? For us ASG tagging is a compliance issue. I know there are workarounds using lambda and such, however our enterprise has a ton of accounts and having this be the default behavior of nodegroups would make our lives so much simpler. Doesn't seem like a big ask.

jaimehrubiks commented 2 years ago

It's almost two years for just setting a simple tag... which turns out to be critical for compliance, security, scaling node groups...

TBBle commented 2 years ago

I don't think it's even been "worked on", it's still in the 'Researching' stage, and probably lost a lot of momentum when it was thought that the value was only for cluster-autoscaler, or for tags on ASGs only to propagate to nodes, both of which are being addressed elsewhere. Or at least that's my impression from https://github.com/aws/containers-roadmap/issues/608#issuecomment-675101463.

Disappointingly, since @mikestef9 asked in that comment a year ago for more ASG-tagging use-cases, we haven't heard any follow-up (or even acknowledgement) on the various use-cases that have been shared here.

stevehipwell commented 2 years ago

@TBBle there are a lot of ASG tagging use cases in this thread before the @mikestef9 comment that launch template tagging wont address, there are also a lot of the ASG tagging use cases called out in #724 when discussing the tags for CA.

TBBle commented 2 years ago

Looking back, the only comment I can see that requested something other than node-tagging via tag propagation, or Cluster Autoscaler tags for scale-from-zero, was https://github.com/aws/containers-roadmap/issues/608#issuecomment-670641308 10 days before @mikestef9's comment, and well after the development work that delivered Managed Node Group Launch Template support would have happened to address the needs called out in this ticket.

A quick skim doesn't show anything in #724 up to that date that isn't specifically about CA scale-from-zero, and if there is such a thing, overlooking it in the context of the rest of the discussion would be pretty easy.

Even the original feature request of this ticket was tagging ASGs for propagation to nodes and volumes.

stevehipwell commented 2 years ago

@TBBle I was sure I'd read a number of requests about compliance policies requiring ASGs to be tagged, but both of these issue threads are so long it's hard to parse the data out of them, but that'd be the one I'd be putting forward.

I'm still of the opinion that something as trivial as cascading tags from MNGs to ASGs should have been implemented as part of the MVP. If there is a better way to handle the same use cases it needs to actually exist and then it can be adopted by virtue and not as the only solution (or no solution).

TBBle commented 2 years ago

The threads are long, but the vast majority of activity was after the comment from @mikestef9 that we were talking about, in August 2020. It's about 20 comments in, compared to the fortyish that have come after that. And #724 spent most of its time before August 2020 talking about actually setting minSize to 0 (and directly modifying the ASG to work around this); the tagging discussion only came up there once, in July 2020, until people were referred to there from here by that comment.

Anyway, looking at "should have been" isn't very valuable here. This is containers-roadmap, not containers-if-we-could-turn-back-time. What's interesting is what is going to be done; currently, that's an empty space, and the rest of the discussion just makes it more likely that the actual use-cases will be lost in the noise and nothing will continue to happen. -_-

(If I could turn back time on this ticket, I'd have opened a different ticket for tagging ASGs for their own sake back when I subscribed to this ticket; at the time I didn't realise that the node-group-tag-propagation part of the request was going to be the need that was fulfilled. Similarly I regret not leaving a comment for my at-the-time use-case way back then too)

richardchen-db commented 2 years ago

@TBBle @stevehipwell I created a separate issue to track the request for ASG tagging (for compliance and cost reasons). This is quite important for many organizations to adopt EKS.

nkk1 commented 2 years ago

the EBS CSI controller needs the tag topology.ebs.csi.aws.com/zone. The PVs created with nodeAffinity and expects the nodes to have topology.ebs.csi.aws.com/zone, when the ASGs have 0 instances, cluster autoscaler needs ASGs to have those tags.. nodegroups manage ASGs, nodegroups should be able to tag ASGs

askulkarni2 commented 2 years ago

aws-node-termination-handler Queue mode needs ASGs to be tagged with Key=aws-node-termination-handler/managed. Ability to propagate tags from MNG to ASG will make it easier for users.

TBBle commented 2 years ago

@nkk1 There's a proposal/discussion for Cluster Autoscaler to assume/populate topology.ebs.csi.aws.com/zone on the scale-from-zero template using the AWS backend. Since the label's auto-added by the CSI Controller, I think that'll be less surprising for the user, and frankly will probably land sooner than ASG user-defined tagging support.

If CA doesn't accept that proposal, then user ASG tagging be the only way to make scale-from-zero work correctly with Managed Node Groups and the EBS CSI Controller, since the effort to migrate away from that label seems to have failed.

stevehipwell commented 2 years ago

@askulkarni2 I think the guidance is to not use NTH with MNGs, @bwagner5 can probably shed a bit more light here.

bwagner5 commented 2 years ago

@stevehipwell @askulkarni2 That is correct. NTH is not needed when using managed node groups for handling terminations. MNG already gracefully handle terminations due to ASG scale-down, spot interruptions, and capacity rebalance.

askulkarni2 commented 2 years ago

@bwagner5 fantastic! Thanks for the insight. And thanks @stevehipwell for bringing it to attention.

mazay commented 2 years ago

That's really crucial for several reasons, following are some of the issues we're experiencing due to this limitation.

  1. Cost analysis issues due to missing tags on ASG and its EC2
  2. Running dedicated node groups for specific workloads requires a couple of cluster-autoscaler tags for properly scaling node groups based on labels and taints.

So I really hope there will be a proper solution instead of some workarounds.

andre-lx commented 2 years ago

This is a really important feature!

Let's say we need to tag the ASG to make the cluster auto scaler work like a charm, check tracking cost, resource ownership, ....

Using AWS managed node groups:

resource "aws_autoscaling_group_tag" "tag_cpu_ng" {
  autoscaling_group_name = aws_eks_node_group.cpu_ng.resources[0].autoscaling_groups[0].name

  tag {
    key                 = "k8s.io/cluster-autoscaler/node-template/taint/X"
    value               = "NoSchedule"
    propagate_at_launch = true
  }
}

And using CloudFormation?

Using self-managed node groups:

SlevinWasAlreadyTaken commented 2 years ago

πŸ‘‹ ℹ️ Managed ASG tagging is now implemented in eksctl with https://github.com/weaveworks/eksctl/pull/5002 and should land in the next release.

nanasi880 commented 2 years ago

I am waiting for this feature to be natively supported by AWS. While eksctl is an excellent solution, it is not suitable for use in the context of Infrastructure as a Code and needs continued support to be available in solutions such as CloudFormation and Terraform.

byustephen commented 2 years ago

@andre-lx 's answer is spot on. Using the "aws_autoscaling_group_tag" resource worked for me. But it only worked for new nodes. So I just cycled out my existing nodes one-by-one, and the new nodes were all tagged as they should. For instance, this is my set up for creating a node group, and creating a aws_autoscaling_group_tag that sets the "Name" tag which shows up in ec2.

resource "aws_eks_node_group" "nodes_group" {
  cluster_name    = aws_eks_cluster.eks_cluster.name
  node_role_arn   = aws_iam_role.eks_assume_role.arn
  subnet_ids      = var.subnet_ids
  ###########
  # Optional
  ami_type        = "AL2_x86_64"
  disk_size       = 60
  instance_types  = ["m6i.xlarge"]
  node_group_name = "worker"
  version         = var.kubenetes_version

  scaling_config {
    desired_size = 2
    max_size     = 4
    min_size     = 1
  }

  update_config {
    max_unavailable = 2
  }

  # Ensure that IAM Role permissions are created before and deleted after EKS Node Group handling.
  # Otherwise, EKS will not be able to properly delete EC2 Instances and Elastic Network Interfaces.
  depends_on = [
    aws_iam_role_policy_attachment.EKS-AmazonEKSWorkerNodePolicy,
    aws_iam_role_policy_attachment.EKS-AmazonEKS_CNI_Policy,
    aws_iam_role_policy_attachment.EKS-AmazonEC2ContainerRegistryReadOnly,
  ]
}

#EKS can't directly set the "Name" tag, so we use the autoscaling_group_tag resource. 
resource "aws_autoscaling_group_tag" "nodes_group" {
  for_each = toset(
    [for asg in flatten(
      [for resources in aws_eks_node_group.nodes_group.resources : resources.autoscaling_groups]
    ) : asg.name]
  )

  autoscaling_group_name = each.value

  tag {
    key   = "Name"
    value = "eks_node_group"
    propagate_at_launch = true
  }
}

In all honesty, Terraform should at least explicitly state in the documentation that their Tag doesn't work for setting the "Name" tag, as that is a key tag that lots of companies use to organize instances and manage billing. Personally I think not having the tags parameter override the "Name" tag is a bug. But... but I'd at least settle for better documentation that describes this work around.

Can you at least update the documentation so other people don't have to waste as much time on this?

Pretty please?

FelipeMaeda commented 2 years ago

Guys, the resource responsible for EC2 tags is the "Resource Tags" of the Launch Template. I have the same problem and i have verified that its not possible to add tags in the Launch Template automatically generated by Terraform. To create the Tags for this resource, we need to provision our own Launch Template.

However, I managed to at least Tag EC2 with the "Default tags" from my repository. Here is the code used:


data "aws_default_tags" "default_tags" {}

# Add 1 Tag for the 1 or more node groups
resource "aws_autoscaling_group_tag" "tag_aws_node_termination_handler" {
  for_each = toset([
    aws_eks_node_group.node_group_name1.resources[0].autoscaling_groups[0].name,
    aws_eks_node_group.node_group_name2.resources[0].autoscaling_groups[0].name
  ])

  autoscaling_group_name = each.value

  tag {
    key   = "aws-node-termination-handler/managed"
    value = "PropagateAtLaunch=true"

    propagate_at_launch = true
  }
}

# Add tags for 1 node group
resource "aws_autoscaling_group_tag" "default_tags_asg_high" {
  count = length(keys(data.aws_default_tags.default_tags.tags))
  autoscaling_group_name = aws_eks_node_group.node_group_name.resources[0].autoscaling_groups[0].name

  tag {
    key                 = keys(data.aws_default_tags.default_tags.tags)[count.index]
    value               = values(data.aws_default_tags.default_tags.tags)[count.index]
    propagate_at_launch = true
  }
}
zboni-gpsw commented 2 years ago

I just came here to say this issue has been open for 900 days and is the 3rd most πŸ‘'d in the project.

afirth commented 2 years ago

Just found this, again in conversation with our AWS AM. kind of a dup of #724

As @SlevinWasAlreadyTaken mentions now available in eksctl thanks to his hard work in https://github.com/weaveworks/eksctl/pull/5002

Some bash to workaround https://github.com/aws/containers-roadmap/issues/724#issuecomment-1077733733 Some terraform in that comment chain too.

Edit: I don't work for AWS, complain to your account manager. I wholeheartedly agree this is ridiculous.

andre-lx commented 2 years ago

Hey guys.

Here at YData we built a solution to solve this issue till the official release. We created an AWS lambda that can tag the EKS Node groups ASG with common tags or specific tags per node group using CloudFormation.

This can be used to add tags for labels and taints (cluster-autoscaler) or any other tags that can help for check tracking cost, resource ownership, ...

Fell free to test and give us your review.

https://github.com/ydataai/aws-asg-tags-lambda

You can use it directly in the template as:

ASGTagLambdaFunction:
  Type: AWS::Lambda::Function
  Properties:
    Role: !GetAtt ASGTagLambdaExecutionRole.Arn
    PackageType: Image
    Code:
      ImageUri: !Ref EcrImageUri
    Architectures:
    - x86_64
    MemorySize: 1024
    Timeout: 300

ASGTagLambdaInvoke:
  Type: AWS::CloudFormation::CustomResource
  DependsOn: ASGTagLambdaFunction
  Version: "1.0"
  Properties:
    ServiceToken: !GetAtt ASGTagLambdaFunction.Arn
    StackID: !Ref AWS::StackId
    AccountID: !Ref AWS::AccountId
    Region: !Ref AWS::Region
    ClusterName: "the EKS cluster name" #!Ref EKSCluster
    CommonTags:
    - Name: "ENVIRONMENT"
      Value: "dev"
      PropagateAtLaunch: true
    NodePools:
    - Name: "system-nodepool" #!GetAtt YourNodeGroup.NodegroupName
      Tags:
      - Name: 'k8s.io/cluster-autoscaler/node-template/taint/TAINT'
        Value: 'NoSchedule'
        PropagateAtLaunch: true
      - Name: 'k8s.io/cluster-autoscaler/node-template/label/LABEL'
        Value: 'LABEL_VALUE'
        PropagateAtLaunch: true
    - Name: "another-pool"
nanasi880 commented 2 years ago

Here at YData we built a solution to solve this issue till the official release.

Each of us already has such a solution. We do not want to do that in the future, so we are requesting official support for such a solution.

https://github.com/aws/containers-roadmap/issues/608#issuecomment-862017125

AWS has always had AutoscalingGroups, and has always had resource tags. We just want to take advantage of it. AWS don't need to develop anything additional. Why can't managed services take advantage of these features? Am I making a strange request?

akestner commented 2 years ago

I wanted to share a recent launch from the EKS team that might be of interest to folks following this issue. Earlier this week we released Cluster-level Cost Allocation Tagging:

With this launch, all EC2 instances which join an EKS cluster are automatically tagged with an AWS-generated cost allocation tag [containing the EKS cluster name]. Any EC2 instance used in an EKS cluster will be tagged automatically without any additional action required, regardless of whether they are provisioned using EKS managed node groups, Karpenter, or directly via EC2. This tag can be used to allocate EC2 costs to individual EKS clusters through AWS Billing and Cost Management tools...

While this feature won't help to propagate customer-defined tags down to the EC2 instances in an EKS cluster, for those of you who are looking for better cost allocation across multiple EKS clusters, this feature will reduce the work required.

gunzy83 commented 2 years ago

While this feature won't help to propagate customer-defined tags down to the EC2 instances in an EKS cluster, for those of you who are looking for better cost allocation across multiple EKS clusters, this feature will reduce the work required.

Thanks for sharing this update, I am sure some will find it useful. I don't want to shoot the messenger and I know you are just trying to help but this really is attacking the problem from the wrong end is it not? :man_shrugging:

EC2 instances that are not a part of a managed node group are already easily taggable with customer-defined tags and managed node groups already had this tag that would be passed through for cost-allocation. From my perspective this change does little to reduce work required if you want to use managed node groups.

I only know this because of having to write code for a previous employer to process this tag to retrieve the customer defined tag values so costs could be allocated in the same way as everything else. Luckily we had well structured cluster names that included the values required. However, it is brittle and the processing broke a few times when cluster name structure was changed for operational reasons (eg to add blue/green support to our automation for EKS upgrades).

Please consider following through on this issue (getting close to 3 years now). We always get told by our account managers, TAMs and SAs to use tags, it would be nice if tagging actually worked for cost allocation for all resources, EKS or otherwise. Thanks.

flowinh2o commented 1 year ago

Can someone please help us understand why this is not getting any traction with this much attention? It appears to be still in the "researching" phase. Just ran into this when trying to scaling from 0 on managed nodes groups using the terraform eks module.

sylr commented 1 year ago

Honestly you should all move to Karpenter.

FernandoMiguel commented 1 year ago

Another vote for karpenter

gunzy83 commented 1 year ago

Honestly you should all move to Karpenter.

Honestly that has to be one of the least helpful suggestions I have seen in while. We are talking about Managed Nodes which are not going to magically get tagged if you install Karpenter in the cluster and start launching nodes (you also need running nodes for Karpenter's controller to run on, bit of chicken and egg problem). Karpenter is a nice tool for sure but it is not a solution to this issue.

otterley commented 1 year ago

you also need running nodes for Karpenter's controller to run on

Customers can run Karpenter on Fargate (managed compute). This helps eliminate the bootstrapping problem. However, resource tagging is not yet available for Fargate on EKS. If Karpenter is the only thing running on Fargate, this might be acceptable for cost-allocation purposes.

flowinh2o commented 1 year ago

Karpenter has some nice features but is a bit more complex for probably most people's needs since it geared for clusters with larger workloads that can benefit from more advanced scheduling and continuous resource optimization. One thing I don't like is Karpenter's requirement to know specific node group info vs just using static tags with the cluster name. Thanks for the suggestion but for those that want to continue to use CA it would be nice to see some tags on the autoscaling groups to solve this.

FernandoMiguel commented 1 year ago

Going off topic here, but curious about this

One thing I don't like is Karpenter's requirement to know specific node group info vs just using static tags with the cluster name.

What do you mean? Karpenter doesn't care about anything other than pending pods to be allocated. Or you can go overboard and create very scoped provisioners per team or deployment label

flowinh2o commented 1 year ago

Going off topic here, but curious about this

One thing I don't like is Karpenter's requirement to know specific node group info vs just using static tags with the cluster name.

What do you mean? Karpenter doesn't care about anything other than pending pods to be allocated. Or you can go overboard and create very scoped provisioners per team or deployment label

I might be wrong but was just looking at their install documentation and noticed they wanted "--set aws.defaultInstanceProfile=KarpenterNodeInstanceProfile-${CLUSTER_NAME} \" in the helm chart but guessing maybe there is a way to just use irsa instead. I also see there is an application of a manifest after you run a helm chart which is a bit odd but these are just installation related issues, and off topic as you said.

stevehipwell commented 1 year ago

@flowinh2o I think aws.defaultInstanceProfile is the instance profile for the nodes being created so they can join the cluster.

max-rocket-internet commented 1 year ago

Running dedicated node groups for specific workloads requires a couple of cluster-autoscaler tags for properly scaling node groups based on labels and taints.

Exactly our problem also. Details here.

hajdukda commented 1 year ago

This looks like such a must-have feature thats super simple to implement if you allow to just propagate node_group tags to asg...

AlecZebrick commented 1 year ago

Any updates on this? Also needed here

RogerWatkins-Anaplan commented 1 year ago

Also having a problem with this - I wouldn't expect in a service called Managed Node Groups that I would have to work around tag propagation issues. It seems vendors such as weaveworks have implemented their own workarounds, sadly there is no such workaround in terraform.

This issue is pretty fundamental - would like it fixing please :)

stevehipwell commented 1 year ago

It seems vendors such as weaveworks have implemented their own workarounds, sadly there is no such workaround in terraform.

@RogerWatkins-Anaplan it's easy enough to do this with Terraform and I think there are links in some of the comments above on how to do it. That said you wouldn't expect to need to do this for a first party vendor solution.

vikramaditya234 commented 1 year ago

Are the guys working on this? This is a must-have for us to implement the project