cloudposse / terraform-aws-eks-node-group

Terraform module to provision a fully managed AWS EKS Node Group
https://cloudposse.com/accelerate
Apache License 2.0
91 stars 128 forks source link

Windows node support #139

Closed ChrisMcKee closed 1 year ago

ChrisMcKee commented 1 year ago

what

why

references

Tested

image image image image

module "eks_windows_node_group" {
  # source  = "cloudposse/eks-node-group/aws"
  # version = "2.6.1"
  source = "github.com/ChrisMcKee/terraform-aws-eks-node-group"

  instance_types     = ["t3.large", "t3a.large", "c5.large", "c6i.large", "m6i.large", "r6i.large"]
  subnet_ids         = [data.terraform_remote_state.network.outputs.private_subnets[1]]
  min_size           = 1
  max_size           = 1
  desired_size       = 1
  cluster_name       = module.eks_cluster.eks_cluster_id
  kubernetes_version = var.kubernetes_version == null || var.kubernetes_version == "" ? [] : [var.kubernetes_version]
  kubernetes_labels  = var.labels

  ami_type = "WINDOWS_CORE_2019_x86_64"

  update_config = [{ max_unavailable = 1 }]

  capacity_type = "SPOT"

  kubernetes_taints = [{
    key    = "OS"
    value  = "Windows"
    effect = "NO_SCHEDULE"
  }]

  node_role_arn                = [aws_iam_role.worker_role_nt.arn]
  node_role_cni_policy_enabled = false #We use the Service Account as per best practice

  associated_security_group_ids = [data.terraform_remote_state.network.outputs.ops_ssh, aws_security_group.workers.id]

  # Enable the Kubernetes cluster auto-scaler to find the auto-scaling group
  cluster_autoscaler_enabled = true

  context = module.windowslabel.context

  # Ensure the cluster is fully created before trying to add the node group
  module_depends_on = [module.eks_cluster.kubernetes_config_map_id]

  # Ensure ordering of resource creation to eliminate the race conditions when applying the Kubernetes Auth ConfigMap.
  # Do not create Node Group before the EKS cluster is created and the `aws-auth` Kubernetes ConfigMap is applied.
  depends_on = [module.eks_cluster, module.eks_cluster.kubernetes_config_map_id]

  create_before_destroy = true

  node_role_policy_arns = ["arn:aws:iam::aws:policy/AmazonSSMManagedInstanceCore"]

  block_device_mappings = [
    {
      "delete_on_termination" : true,
      "device_name" : "/dev/xvda",
      "encrypted" : true,
      "volume_size" : 80,
      "volume_type" : "gp3"
    }
  ]

  node_group_terraform_timeouts = [{
    create = "40m"
    update = null
    delete = "20m"
  }]

  #Valid types are "instance", "volume", "elastic-gpu", "spot-instances-request", "network-interface".
  resources_to_tag = ["instance", "volume", "spot-instances-request", "network-interface"]
}

related

ChrisMcKee commented 1 year ago

@osterman How often are PRs reviewed ooi?

aknysh commented 1 year ago

/test all

ChrisMcKee commented 1 year ago

Just trying to work out why it's stopped working. It all spins up but the nodes don't link to eks.. I'm using the completeexample though modified to test this too. I'll update when it plays ball 🤔

nitrocode commented 1 year ago

@ChrisMcKee resolved conflicts and merged changes in

ChrisMcKee commented 1 year ago

Relies on https://github.com/cloudposse/terraform-aws-eks-cluster/pull/175

ChrisMcKee commented 1 year ago

Probably made git cry but rebased off master and squashed out all the autoformat commits etc. I've pulled the windows example out of examples.tf I'll add one in a separate PR as there's a bit of faff required when you use a userscript and an ami is used as it needs to append the node to the aws_auth config map (like usual) but the windows node requires an extra permission for ipam.

ChrisMcKee commented 1 year ago

@aknysh all good now; I've pushed out a few windows node groups to our internal clusters using this combined with the cluster pr.

ChrisMcKee commented 1 year ago

@aknysh sorry for the repeated changes after the fact; I hit an issue randomly after using my fork since the pr went up and, as you do, I assumed I'd missed a commit or messed up. Turned out to be a flipping bug with kube-proxy 🤦 The module as it stands works.

The only remaining issue I have with it is the validation in the variables file over release; the regex is obviously setup like that for a reason but the Windows AMI's dont follow the same setup being more like 1.24-2023.02.14

ChrisMcKee commented 1 year ago

@aknysh sorry for the repeated changes after the fact; I hit an issue randomly after using my fork since the pr went up and, as you do, I assumed I'd missed a commit or messed up. Turned out to be a flipping bug with kube-proxy 🤦 The module as it stands works.

The only remaining issue I have with it is the validation in the variables file over release; the regex is obviously setup like that for a reason but the Windows AMI's dont follow the same setup being more like 2023.02.14

I've changed the regex to the original plus an or to cover the windows ami format. I've added the unit tests to the commit https://regex101.com/r/xb7q2f/2

nitrocode commented 1 year ago

cc @Nuru @aknysh for one more set of eyes

ChrisMcKee commented 1 year ago

@nitrocode @aknysh This will also resolve pr #137 & issue #133

aknysh commented 1 year ago

/test all