aws / containers-roadmap

This is the public roadmap for AWS container services (ECS, ECR, Fargate, and EKS).
https://aws.amazon.com/about-aws/whats-new/containers/
Other
5.2k stars 315 forks source link

[EKS] [request]: Nodegroup should support tagging ASGs #608

Open bhops opened 4 years ago

bhops commented 4 years ago

Community Note

Tell us about your request It would be great if we could pass tags to the underlying ASGs (and tell the ASGs to propagate tags) that are created from the managed node groups for EKS so that the underlying instances/volumes are tagged appropriately.

Which service(s) is this request for? EKS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? Currently, managed node groups for EKS do not support propagating tags to ASGs (and therefore the instances) created by the node group. This leads to EC2 instances that are not tagged according to our requirements for tracking cost, and resource ownership.

sriramgk commented 4 years ago

Passing the managed node tags to launch templates "Instance tags" will automatically apply to both EC2 and its volumes. If there are some challenges to do that, creating a separate "Custom Tags" section in the EKS managed node configuration page will also be helpful.

mailjunze commented 4 years ago

Workaround to add custom tags to WorkerNodes using EKS managed NodeGroup :

yardensachs commented 4 years ago

This is crucial feature that is missing, and is the only reason our department is not moving from manual ASGs to node groups.

evgmoskalenko commented 4 years ago

Yes, this is a very important change. We also cannot use this because of the need for tags. Bad practice to use semi-automatic infrastructure as code.

ozhankaraman commented 4 years ago

Any update ?

amitsehgal commented 4 years ago

any updates ? can you open source node groups.. so, community can contribute ?

jerry153fish commented 4 years ago

any updates ?

atrepca commented 4 years ago

Is this a duplicate of #374?

TBBle commented 4 years ago

I don't think it's a duplicate. This one is for an API feature to add tags to the ASG created by the API, and also be able to set the flag on the ASG that propagates tags outwards: so it's only an API change to implement the same thing down manually in the workaround above.

374 is for the EKS Cluster object itself to support propagating tags down, in the way ASGs already do. I imagine #374 would partially work by propagating tags to ASGs, and then turning on ASG tag propagation, rather than duplicating the behaviour.

otterley commented 4 years ago

Team: Having this functionality available will enable customers to use Cluster Autoscaler's capacity autodiscovery feature instead of forcing them to maintain manual capacity mappings on the command line.

The documentation there isn't super clear (see https://github.com/kubernetes/autoscaler/pull/3198 for documentation updates), but advertising capacity resources to Cluster Autoscaler via ASG tags will make the use of multiple heterogeneous Auto Scaling Groups much easier for customers.

rtripat commented 4 years ago

Team: Having this functionality available will enable customers to use Cluster Autoscaler's capacity autodiscovery feature instead of forcing them to maintain manual capacity mappings on the command line.

The documentation there isn't super clear (see kubernetes/autoscaler#3198 for documentation updates), but advertising capacity resources to Cluster Autoscaler via ASG tags will make the use of multiple heterogeneous Auto Scaling Groups much easier for customers.

@otterley While Managed Nodegroup doesn't support customer provided tags for ASGs today, we do add the necessary tags for CAS auto discovery to the ASG i.e. k8s.io/cluster-autoscaler/enabled and k8s.io/cluster-autoscaler/<CLUSTER NAME>.

otterley commented 4 years ago

@rtripat Understood. Perhaps I wasn't clear, but I was specifically referring to the ability to autodiscover specific capacity dimensions of an ASG such as cpu, memory, ephemeral storage, GPU, etc.

privomark commented 4 years ago

Until this feature is ready, I've had success with creating a cloudwatch rule based upon EC2 "pending" status, invoking a lambda that checks the instance_id passed in through the event, checks the instance_id to see if it's part of a managed node cluster, then adds the appropriate tags. I'm doing this all through Terraform with the spin up of the eks cluster.

Obviously would be much easier with a tags option! 😛

yann-soubeyrand commented 4 years ago

It could be great to be able to tag the launch templates too with the option to propagate these tags to instances and volumes or not.

Is there some kind of best practice on tagging ASG vs tagging LT? It seems to me that tagging LT offers more flexibility (like the ability to tag the volumes).

TBBle commented 4 years ago

https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-tagging.html touches upon the overlap in tag propagation between ASGs and Launch Templates.

yann-soubeyrand commented 4 years ago

https://docs.aws.amazon.com/autoscaling/ec2/userguide/autoscaling-tagging.html touches upon the overlap in tag propagation between ASGs and Launch Templates.

That's precisely the documentation page I had in mind when asking about best practices ;-) This page explains the overlap but there are no clear pros and cons of the two tagging approaches. But it seems to me that LT offers more flexibility and that ASG tags should be used only when necessary (like for the cluster autoscaler discovery tags).

TBBle commented 4 years ago

There's a related discussion about tagging ASGs and LTs for non-managed Nodegroups at https://github.com/weaveworks/eksctl/issues/1603. My understanding from there is that tagging LTs and enabling propagation would be sufficient, but there might be use-cases where the ASG needs to have the tag too, but it wouldn't then needed to also support propagation.

The difference observed in that ticket is that the ASG propagation applies the tags after launch, while LT propagation applies the tags as part of the launch.

yann-soubeyrand commented 4 years ago

Yes, I create my non-managed node groups using Terraform and put the tags on the LT with propagation to instances and volumes. The only tags I needed to put on ASG are the cluster autoscaler related tags. But propagation is not needed for these tags.

Missshao commented 4 years ago

need this feature too, will impact calculate costs if I add the tags manually later in ASG.

gunzy83 commented 4 years ago

We have EKS deployed as a new part of our stacks in prod through preprod, stage and dev (alongside a very large ECS deployment in each environment). It is very annoying that the instances are not tagged for cost allocation.

rosscdh commented 4 years ago

+1 cost calcs are reaaaly important

StanBorbatTR commented 4 years ago

I would also like to see custom names or name prefixes for the autoscaling groups. The auto-generated uuid naming really slows down management of larger clusters.

mikestef9 commented 4 years ago

With managed node groups support for launch templates, you can now add tags to the EC2 instances created as part of your node groups. See EKS docs for details.

I will leave this issue open for a little while, as I want to get some more feedback. The issue as originally opened asks for tags on ASGs, but I suspect most of you ultimately care about tags on EC2 instances, not the ASGs. Please leave any comments if you still have a need for tags on the ASGs themselves. Our vision is we handle any of these ASG tags for you, for example when we implement scale to 0 #724, we'll automatically add the required tags to the ASG.

bhops commented 4 years ago

I will leave this issue open for a little while, as I want to get some more feedback. The issue as originally opened asks for tags on ASGs, but I suspect most of you ultimately care about tags on EC2 instances, not the ASGs.

As the original issue creator, I can confirm that being able to tag the underlying EC2 instances was indeed the intent of the original ask. Though others may have had other reasons for wanting ASGs to be tagged.

Thank you to the EKS team for implementing this!

dindurthy commented 4 years ago

Tags on the ASG are crucial if the ASG scales to zero. The cluster autoscaler for example will use the ASG tags if they exist. Without a way to propagate tags to the ASGs, we either have to run with unnecessary hosts or we have to bootstrap the ASGs directly.

mikestef9 commented 4 years ago

@dindurthy As I mentioned above, "Our vision is we handle any of these ASG tags for you, for example when we implement scale to 0 #724, we'll automatically add the required tags to the ASG."

Nuru commented 3 years ago

@mikestef9 While it is an admirable goal to handle the ASG tags automatically, it seems unlikely you will be able to do it quickly or easily. There are tags for node labels, node taints, and node resources, and it is unlikely EKS will be aware of all of them because of the various ways they can be created. At the moment it appears I cannot even get tags to propagate from the launch template to elastic GPUs (it seems EKS makes a copy of the launch template rather than use it directly, and the copy disables the "tag elastic graphics" setting), which makes me wary of trusting automatic behavior. I would rather you implement direct ASG tagging (or at least copying Launch Template tags to the ASG) first and see about automation later.

cdenneen commented 3 years ago

Did the tags used to work? I thought they did but all my nodeGroups now no longer have the tags specified in the eksctl configuration. Need to get these tags back in as they are used for Cost reporting.

spicysomtam commented 3 years ago

Docs says the feature is there now but it does not work. Is it a tf 0.13.x feature as I am still on 0.12.x (don't want to move to 0.13 yet)? Would be nice the EC2 worker instances had a meaningful name, rather than just a hyphen.

tags = {
    "Name" = "eks-${var.cluster-name}-1"
  }
lirlia commented 3 years ago

I really want for tags on ASGs.

Because I attach target group to my ASG ( we don't use type: loadbalancer / use nodeport) . Since we have multiple ASGs for different purposes, we need to be able to identify them and attach the Target Group.

qli-conviva commented 3 years ago

really need to pass tags of node-group to ec2 & volumes, otherwise we have to query the instances of eks node-group and tagged it out of automatical process

ravvereddy commented 3 years ago

With managed node groups support for launch templates, you can now add tags to the EC2 instances created as part of your node groups. See EKS docs for details.

I will leave this issue open for a little while, as I want to get some more feedback. The issue as originally opened asks for tags on ASGs, but I suspect most of you ultimately care about tags on EC2 instances, not the ASGs. Please leave any comments if you still have a need for tags on the ASGs themselves. Our vision is we handle any of these ASG tags for you, for example when we implement scale to 0 #724, we'll automatically add the required tags to the ASG.

Tags can be propagated for EC2 instances but lets say if i need my EC2 instances to be tagged as node-01, node-02, node-03..... which is not happening as the ASG is the one which triggers to launch the Nodes not the Launch template. This is something very important.

nanasi880 commented 3 years ago

I need to tag the Autoscaling Group itself, not EC2. I want to monitor the desired capacity of the AutoscalingGroup in Datadog, and I need to be able to set arbitrary tags on the AutoscalingGroup itself in order to be able to use it comfortably.

Autoscaling Groups created by Managed NodeGroups do not output metrics to CloudWatch, which is another issue, but tagging is still important

luizmiguelsl commented 3 years ago

It's kind of frustrating not to be able to tag our node group instances programmatically. In my case, I'm using terraform and already tried the tags and additional_tags, and neither one propagated the tags to ASG or instances itself. Our main goal with this tags is the cost allocation, so it would be extremely helpful.

calvinbui commented 3 years ago

Please leave any comments if you still have a need for tags on the ASGs themselves.

+1

TBBle commented 3 years ago

Is @ravvereddy's use-case (of having per-node tags generated by ASG) actually supported?

I don't see anything in the docs hinting that there is some kind of templating for tags propagated from ASGs, so it seems feature-wise that ASG tagging doesn't bring anything more for node tagging than Launch Templates do.

I think it'd be particularly helpful to know if there are any use-cases for ASG tags propagating to nodes that aren't covered by Launch Template instance tags. I'd assume the latter can cover cost-allocation tracking or metric identification for EC2 instances, for example.

If not, then this question becomes simpler as then we have a clear "best practice" for tagging EC2 instances (Launch Templates, which already works), and this ticket can focus on the remaining needs for ASG-specific tags.

Tagging of ASGs themselves is still needed for Cluster Autoscaler scale-to-zero (#724 should cover the specifics of that use-case, I hope, as they do not require instance propagation) and resource ownership identification on accounts shared between teams, which is the use-case I've had in the past. My studio has graduated to multiple accounts under AWS Organizations, so that use-case has fallen off my radar now.

andrewjeffree commented 3 years ago

Tagging of the ASG themselves is handy for some other stuff we want to run e.g https://github.com/AutoSpotting/AutoSpotting requires a tag on the ASG for it to do it's thing.

lgbraus commented 3 years ago

I have created a custom resource which tags the ASG and propagates to EC2 instances. Our cluster was created as below:

 ### EKS control plane ###
  Cluster:
    Type: AWS::EKS::Cluster
    Properties:
      Name: !Sub ${EKSClusterName}-${Environment}
      Version: !Sub ${KubernetesVersion}
      RoleArn: !GetAtt  ClusterRole.Arn
      ResourcesVpcConfig:
        SecurityGroupIds:
          - !Ref ClusterControlPlaneSecurityGroup
        SubnetIds:
          - Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-a
          - Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-b
          - Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-c

The node group was created like this:

### Create EKS managed node group ###
  Nodegroup:
    DependsOn: Cluster
    Type: 'AWS::EKS::Nodegroup'
    Properties:
      NodegroupName: !Sub ${EKSClusterName}-node-${Environment}
      ClusterName: !Ref Cluster
      InstanceTypes: 
        - !Ref NodeInstanceType
      DiskSize: !Ref NodeVolumeSize
      RemoteAccess:
        Ec2SshKey: !Sub ${EKSClusterName}-${Environment}
        SourceSecurityGroups: 
          - !Ref NodeSecurityGroup
      NodeRole: !GetAtt NodeInstanceRole.Arn
      ScalingConfig:
        MinSize: !Ref NodeGroupMinSize
        MaxSize: !Ref NodeGroupMaxSize
        DesiredSize: !If [IsNotProd, 1, !Ref NodeGroupDesiredCapacity]
      Labels:
        type: !Ref Environment
      Subnets:
        - Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-a
        - Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-b
        - Fn::ImportValue: !Sub ${VpcStackName}-${Environment}-private-c

Then we tag the ASG with the custom resource (the tag name is "Name" and our tag value is the cluster name):

  ## Tag resources ###
  AsgTaggingRole:
    Type: AWS::IAM::Role
    Properties:
      AssumeRolePolicyDocument:
        Version: '2012-10-17'
        Statement:
        - Effect: Allow
          Principal:
            Service:
            - lambda.amazonaws.com
          Action:
          - sts:AssumeRole
      Path: "/"
      Policies:
      - PolicyName: lambda-logging
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - logs:CreateLogGroup
            - logs:CreateLogStream
            - logs:PutLogEvents
            Resource: arn:aws:logs:*:*:*
      - PolicyName: lambda-tagging
        PolicyDocument:
          Version: '2012-10-17'
          Statement:
          - Effect: Allow
            Action:
            - autoscaling:CreateOrUpdateTags
            Resource:
              - '*'
          - Effect: Allow
            Action:
            - eks:DescribeNodegroup
            Resource: '*'

  AsgTagging:
    Type: Custom::AsgTagging
    Properties:
      ServiceToken: !GetAtt AsgTaggingFunction.Arn
      AsgId: !GetAtt Nodegroup.NodegroupName # Get the node group name

  AsgTaggingFunction:
    Type: AWS::Lambda::Function
    Properties:
      Runtime: python3.7
      Handler: index.lambda_handler
      MemorySize: 128
      Role: !GetAtt AsgTaggingRole.Arn
      Timeout: 120
      Environment:
        Variables:
          TAG_KEY: Name
          TAG_VALUE: !Ref Cluster # Get the EKS cluster name
          EKS_CLUSTER: !Ref Cluster # Get the EKS cluster name
          NODE_GROUP: !GetAtt Nodegroup.NodegroupName # Get the node group name
      Code:
        ZipFile: |
          import boto3
          from botocore.exceptions import ClientError
          import os
          import cfnresponse 

          def lambda_handler(event, context):
            print("Event :", event)
            data = {}
            tag_key = os.getenv('TAG_KEY')
            tag_value = os.getenv('TAG_VALUE')
            eks_cluster = os.getenv('EKS_CLUSTER')
            node_group = os.getenv('NODE_GROUP')

            try:
              eks = boto3.client('eks')
              # Retrieve autoscaling group name
              asg = eks.describe_nodegroup(clusterName=eks_cluster, nodegroupName=node_group)['nodegroup']['resources']['autoScalingGroups'][0]['name']
            except Exception as e:
              print(e)

            try:
              client = boto3.client('autoscaling')
              if event['RequestType'] == 'Create':
                res = client.create_or_update_tags(
                    Tags=[
                        {
                            'Key': tag_key,
                            'PropagateAtLaunch': True,
                            'ResourceId': asg,
                            'ResourceType': 'auto-scaling-group',
                            'Value': tag_value,
                        }
                    ],
                )
                data["Reason"] = "The ASG " + asg + " has been tagged."
                cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
              elif event['RequestType'] == 'Update':
                res = client.create_or_update_tags(
                    Tags=[
                        {
                            'Key': tag_key,
                            'PropagateAtLaunch': True,
                            'ResourceId': asg,
                            'ResourceType': 'auto-scaling-group',
                            'Value': tag_value,
                        }
                    ],
                )
                data["Reason"] = "The ASG " + asg + " has been tagged."
                cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
              elif event['RequestType'] == 'Delete':
                    data["Reason"] = "Resource deleted"
                    cfnresponse.send(event, context, cfnresponse.SUCCESS, data)
              else:
                data["Reason"] = "Unknown operation: " + event['RequestType']
                cfnresponse.send(event, context, cfnresponse.FAILED, data, "")

            except Exception as e:
              data["Reason"] = "Cannot " + event['RequestType'] + " Resource: " + str(e)
              cfnresponse.send(event, context, cfnresponse.FAILED, data, "")

I hope this can help.

HenryYanTR commented 3 years ago

Anything missing mandatory tags is considered non-complaint in my organization. EKS ASGs got deleted when compliance scan kicks in. We really need to have the tags propagated from the managed node group to these ASGs.

yangxintian commented 3 years ago

Same here, using the launch template to propagate tags to EC2 instances are not enough, we also need to tag the ASG itself to be compliant to our organization's policy, otherwise it will be scaled down to 0.

Anything missing mandatory tags is considered non-complaint in my organization. EKS ASGs got deleted when compliance scan kicks in. We really need to have the tags propagated from the managed node group to these ASGs.

manlinl commented 3 years ago

Why is this issue controversial? Many company need tag on ASG for cost and compliance reason.

TBBle commented 3 years ago

I'm not sure what you're seeing as controversial in this ticket?

I don't see anyone saying that this should not happen, or otherwise introducing controversy?

qli-conviva commented 3 years ago

EKS ver == 1.20 As a workaround, have to launch a nodegroup, then custom launch template with resource tags from the nodegroup template, then delete existing nodegroup, and re-run new nodegroup with customize template to apply resource tags.

nanasi880 commented 3 years ago

Why this issue is controversial I do not know. AWS has always had AutoscalingGroups, and has always had resource tags. We just want to take advantage of it. AWS don't need to develop anything additional. Why can't managed services take advantage of these features? Am I making a strange request?

The AutoscalingGroup name automatically generated by Managed Node Groups is indistinguishable to humans. Without resource tags, you won't be able to comfortably tell them apart.

Additional contexts: https://github.com/aws/containers-roadmap/issues/608#issuecomment-754517652

newb1e commented 3 years ago

Trying to manage SPOT/OD node groups as managed node groups. To be able to scale from zero in such scenario I need to tag the ASG's according to this doc. Without the option to add custom tags, I'm unable to make this work with managed node groups.

TBBle commented 3 years ago

Scale-from-zero with Cluster Autoscaler is #724.

stevehipwell commented 3 years ago

@TBBle this is related to the Scale-from-zero with Cluster Autoscaler #724 but acording to the docs is needed more urgently to have cluster autoscaler work correctly with labelled and tainted nodes.

In my mind there are three ways to solve this.

  1. Support copying all managed node group tags to the ASG
  2. Support copying all tags with a specific prefix or prefixes to the ASG
  3. Automate creating the k8s.io/cluster-autoscaler/node-template/label/ & k8s.io/cluster-autoscaler/node-template/taint/ tags on the ASG

Option 1 would be the status quo to un-managed node groups, option 2 would limit the scope of the tags and option 3 would actually make managed node groups a better solution than their un-managed counterparts.

TBBle commented 3 years ago

The long-term solution chosen by AWS is none of those (instead, Cluster Autoscaler reads Managed Nodegroups metadata directly to learn the labels and taints), but in #724 you'll see examples and workarounds implementing your approaches, and that would be the place to make your case that scale-from-zero can't wait for the implementation of the CA feature, but should be handled by some kind of ASG tag automation as you have described.

morsik commented 3 years ago

@otterley While Managed Nodegroup doesn't support customer provided tags for ASGs today, we do add the necessary tags for CAS auto discovery to the ASG i.e. k8s.io/cluster-autoscaler/enabled and k8s.io/cluster-autoscaler/<CLUSTER NAME>.

@rtripat I wouldn't say that all necessary tags are added…

I'm trying to use autoscaler in architecture-mixed EKS (ARM + x86) and it just doesn't works because I have GitLab Runner running on ARM node which spins x86 Pod. Autoscaler is totally unaware of nodeSelector kubernetes.io/arch=amd64 I set for GitLab Runner and can't scale up from 0-node x86 node groups.

I followed docs, added tags to terrafrom (k8s.io/cluster-autoscaler/node-template/label/kubernetes.io/arch=amd64 to be specific), they were added to nodegroups and… well… doesn't works because ASG didn't get those tags. Adding them to ASG manually makes autoscaler works properly in this scenario.

But handling OS and Arch should be fully automatic. C'mon, EKS management costs A LOT and it's unable to export basic Kubernetes labels… :/

daenney commented 3 years ago

As noted in https://github.com/aws/containers-roadmap/issues/724, https://github.com/hashicorp/terraform-provider-aws/pull/20674 is pending release which will allow you to add the tags to the ASG that's implicitly created by the managed node group (assuming you're using terraform to create those node pools or are able to otherwise find out the ASG name).

It's a lot more work than if it would happen automatically, but it's at least possible now.