cloudposse / terraform-aws-eks-node-group

Terraform module to provision a fully managed AWS EKS Node Group
https://cloudposse.com/accelerate
Apache License 2.0
91 stars 128 forks source link

Tags missing in default launch template of ASG #131

Closed iamdevnull closed 1 year ago

iamdevnull commented 1 year ago

Describe the Bug

Hey guys, i created a node group based on:

module "eks_node_group" {
  source = "cloudposse/eks-node-group/aws"
  version = "2.6.0"

  subnet_ids        = module.subnets.private_subnet_ids
  cluster_name      = module.eks_cluster.eks_cluster_id
  instance_types    = ["t3.medium"]
  desired_size      = "2"
  min_size          = "2"
  max_size          = "6"

  # Prevent the node groups from being created before the Kubernetes aws-auth ConfigMap
  module_depends_on = module.eks_cluster.kubernetes_config_map_id

  # Enable the Kubernetes cluster auto-scaler to find the auto-scaling group
  cluster_autoscaler_enabled = true

  context = module.label.context
}

It creates successfully the nodegroup and you can see it inside EKS over the "Data processing" tab. If you click on the nodegroup you will see the "Autoscaling group name" and if you check the launch template of the ASG, it uses an generated launch template. But this active launch template does not include all the tags based on the null-label modul.

There is also another generated launch template (with matching prefix name) which has all the tags which i expect, but it is inactive.

Expected Behavior

I expect that the ASG uses the inactive launch template with all tags based on the terraform null-label modul.

Do you have an advice what could be wrong?

Screenshots

Overview image

Inactive - good launch template image

ASG with bad launch template image

bad launch template image

nitrocode commented 1 year ago

Not sure how this is possible

This is the template defined in the group

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/main.tf#L116

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/main.tf#L145-L148

Launch template

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/variables.tf#L267-L269

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/launch-template.tf#L25-L32

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/launch-template.tf#L43

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/launch-template.tf#L51

https://github.com/cloudposse/terraform-aws-eks-node-group/blob/c611f2df9c3dcfc81790ff670a462d78bbe92364/launch-template.tf#L138-L142

It would be interesting to see the launch template output of your node group and compare that with the terraform launch template generated by the module using

terraform state show <resource address>

Also are you passing a launch_template_id? It would help to see the full inputs of the module.

If you can identify the problem and submit a fix, we'd really appreciate it!

iamdevnull commented 1 year ago

@nitrocode thx for your reply:

As you can see in my initial post, i dont use a custom launch template and i also didnt specify a launch template id var in the module definition:

Here is the debug output:

└──╼ $terraform state show "module.eks_node_group.aws_eks_node_group.default[0]"
# module.eks_node_group.aws_eks_node_group.default[0]:
resource "aws_eks_node_group" "default" {
    ami_type        = "AL2_x86_64"
    arn             = "arn:aws:eks:eu-central-1:380892642425:nodegroup/k8s-staging-cluster/k8s-staging-workers/fcc06b26-df1a-155c-6b8f-a2033ec19c20"
    capacity_type   = "ON_DEMAND"
    cluster_name    = "k8s-staging-cluster"
    disk_size       = 0
    id              = "k8s-staging-cluster:k8s-staging-workers"
    instance_types  = [
        "t3.medium",
    ]
    labels          = {}
    node_group_name = "k8s-staging-workers"
    node_role_arn   = "arn:aws:iam::380892642425:role/k8s-staging-workers"
    release_version = "1.21.5-20220429"
    resources       = [
        {
            autoscaling_groups              = [
                {
                    name = "eks-k8s-staging-workers-fcc06b26-df1a-155c-6b8f-a2033ec19c20"
                },
            ]
            remote_access_security_group_id = ""
        },
    ]
    status          = "ACTIVE"
    subnet_ids      = [
        "subnet-050d0a3515d1bdd32",
        "subnet-0dac80fc4022efc50",
    ]
    tags            = {
        "Attributes"                                    = "k8s-staging-workers"
        "Billing"                                       = "staging"
        "BusinessUnit"                                  = "DevOps"
        "ManagedBy"                                     = "Terraform"
        "Name"                                          = "k8s-staging-workers"
        "k8s.io/cluster-autoscaler/enabled"             = "true"
        "k8s.io/cluster-autoscaler/k8s-staging-cluster" = "owned"
        "kubernetes.io/cluster/k8s-staging-cluster"     = "owned"
    }
    tags_all        = {
        "Attributes"                                    = "k8s-staging-workers"
        "Billing"                                       = "staging"
        "BusinessUnit"                                  = "DevOps"
        "ManagedBy"                                     = "Terraform"
        "Name"                                          = "k8s-staging-workers"
        "k8s.io/cluster-autoscaler/enabled"             = "true"
        "k8s.io/cluster-autoscaler/k8s-staging-cluster" = "owned"
        "kubernetes.io/cluster/k8s-staging-cluster"     = "owned"
    }
    version         = "1.21"

    launch_template {
        id      = "lt-061545a37f63d7d8f"
        name    = "k8s-staging-workers2022051808224082680000000a"
        version = "3"
    }

    scaling_config {
        desired_size = 4
        max_size     = 6
        min_size     = 2
    }

    update_config {
        max_unavailable            = 1
        max_unavailable_percentage = 0
    }
}

I did the same analysis as you, using the TF files. However, it is interesting to see that the following template is correctly included in the NodeGroup definition (see above screenshots), but it is not used by the ASG (see above screenshot: ASG with bad launch template):

   launch_template {
        id = "lt-061545a37f63d7d8f"
        name = "k8s-staging-workers2022051808224082680000000a"
        version = "3"
    }
nitrocode commented 1 year ago

That is quite bizarre. Do you see drift when you try to replan the terraform?

If not, can you confirm the discrepancy in the aws api and raise a support ticket? Id be curious what aws support says.

iamdevnull commented 1 year ago

That is quite bizarre. Do you see drift when you try to replan the terraform?

Yes i did, no drift.

If not, can you confirm the discrepancy in the aws api and raise a support ticket? Id be curious what aws support says.

thx for your advice.

nitrocode commented 1 year ago

If you have time, please follow up when you figure out what the cause of this is

iamdevnull commented 1 year ago

If you have time, please follow up when you figure out what the cause of this is

I found this:

https://github.com/terraform-aws-modules/terraform-aws-eks/issues/1558

iamdevnull commented 1 year ago

Based on support answer:

image

Here is the solution:

module "eks_node_group" {
  source = "cloudposse/eks-node-group/aws"

  # ...

  resources_to_tag = ["instance", "volume", "network-interface"]

  context = module.label.context
}

Hope this helps someone!

nitrocode commented 1 year ago

Thank you for posting that. I wonder why that isn't already a default.

nitrocode commented 1 year ago

Thank you @iamdevnull, it is now the new default in the latest version v2.6.1.