BUG: VPC CNI Addon prefix delegation not working

arunsisodiya commented 1 year ago

Description

In our organization, we are using AWS EKS for the Kubernetes cluster. To enable the CNI, we are using the addon provided by AWS i.e. vpc-cni. In addition to that we are also using secondary CIDR for having a large number of IPs for the pods running inside the cluster. Currently, we are using the steps defined here - https://tf-eks-workshop.workshop.aws/500_eks-terraform-workshop/570_advanced-networking/secondary_cidr/configure-cni.html

With this approach, we need to run some scripts but I want to have a native way of doing it. I am following the examples defined here - https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples/vpc-cni-custom-networking but the prefix-delegation is not working as expected.

In the ideal scenario, the VPC CNI should be available with the options configured and then the node group should come up so that we don't have to recreate the node groups but in my case, it is not happening. The prefix delegation is not picking up on the ec2 machines.

Versions

Module version [Required]: ~> 19.0
Terraform version: 1.3.1
Provider version(s):
provider registry.terraform.io/hashicorp/aws v4.54.0
provider registry.terraform.io/hashicorp/cloudinit v2.2.0
provider registry.terraform.io/hashicorp/helm v2.9.0
provider registry.terraform.io/hashicorp/kubernetes v2.17.0
provider registry.terraform.io/hashicorp/local v2.3.0
provider registry.terraform.io/hashicorp/null v3.2.1
provider registry.terraform.io/hashicorp/tls v4.0.4
provider registry.terraform.io/oboukili/argocd v3.2.1
```

```

Reproduction Code [Required]

Steps to reproduce the behavior:

EKS Configuration:

################################################################################
# EKS Module
################################################################################
module "eks" {

  ## Module configuration
  source  = "registry.terraform.io/terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  ## Cluster configuration
  cluster_name    = format("%s-%s", local.name, var.environment)
  cluster_version = local.cluster_version
  create          = true

  ## Security configuration
  create_iam_role               = true
  create_cluster_security_group = true
  create_kms_key                = true
  cluster_encryption_config = {
    provider_key_arn = aws_kms_key.eks.arn
    resources        = ["secrets"]
  }

  cluster_tags = {
    Name = local.name
  }
  tags = var.tags

  ## Networking configuration
  vpc_id                               = module.vpc.vpc_id
  subnet_ids                           = concat(module.vpc.private_subnets, module.vpc.public_subnets)
  cluster_endpoint_private_access      = var.eks_configuration.private_access
  cluster_endpoint_public_access       = var.eks_configuration.public_access
  cluster_endpoint_public_access_cidrs = local.public_access_cidrs
  cluster_service_ipv4_cidr            = "172.16.0.0/12"

  ## Logging configuration
  cluster_enabled_log_types = ["api", "audit", "authenticator", "controllerManager", "scheduler"]
  #  cloudwatch_log_group_kms_key_id        = aws_kms_key.eks.key_id
  cloudwatch_log_group_retention_in_days = 7

  ## Timeout configuration
  cluster_timeouts = {
    create = "60m"
    delete = "30m"
    update = "60m"
  }

  # AWS auth configmap
  manage_aws_auth_configmap = true
  aws_auth_roles            = local.aws_auth_roles
  aws_auth_users            = local.aws_auth_users
  aws_auth_accounts         = local.aws_auth_accounts

  ## Cluster add-on configuration
  cluster_addons = {
    coredns = {
      most_recent   = true
      addon_version = data.aws_eks_addon_version.coredns.version
    }
    kube-proxy = {
      most_recent   = true
      addon_version = data.aws_eks_addon_version.kube_proxy.version
    }
    vpc-cni = {
      most_recent              = true
      before_compute           = true
      addon_version            = data.aws_eks_addon_version.vpc_cni.version
      service_account_role_arn = aws_iam_role.vpc_cni.arn
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          WARM_ENI_TARGET                    = "1"
          ENABLE_PREFIX_DELEGATION           = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"
        }
      })
    }
    aws-ebs-csi-driver = {
      most_recent              = true
      addon_version            = data.aws_eks_addon_version.aws_esb_csi_driver.version
      service_account_role_arn = aws_iam_role.ebs_driver.arn
    }
  }
  ## Node Group configuration - Defaults
  eks_managed_node_group_defaults = {
    ami_type       = "AL2_x86_64"
    disk_size      = 50
    instance_types = ["m5.large", "m5.xlarge"]
    metadata_options = {
      http_endpoint               = "enabled"
      http_tokens                 = "optional"
      http_put_response_hop_limit = 2
    }
  }

  ## Node Group configuration
  eks_managed_node_groups = local.node_groups_map

  ## Disabling IRSA via module - Managing manually for better control
  enable_irsa = false
}

Node Group Configuration:

{
      name                              = format("%s-%s-%s", local.name, var.environment, node_group["name"])
      use_name_prefix                   = true
      create_launch_template            = true
      subnet_ids                        = concat(module.vpc.private_subnets)
      cluster_primary_security_group_id = module.eks.cluster_primary_security_group_id

      capacity_type = "ON_DEMAND"
      ami_id        = data.aws_ami.eks_default.image_id
      desired_size  = node_group["desired_nodes"]
      max_size      = node_group["max_nodes"]
      min_size      = node_group["min_nodes"]

      enable_bootstrap_user_data = true
      bootstrap_extra_args       = "--container-runtime containerd"
      pre_bootstrap_user_data    = <<-EOT
      #!/bin/bash
      set -ex

      # https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html#determine-max-pods
      MAX_PODS=$(/etc/eks/max-pods-calculator.sh \
      --instance-type-from-imds \
      --cni-version ${trimprefix(data.aws_eks_addon_version.vpc_cni.version, "v")} \
      --cni-prefix-delegation-enabled \
      --cni-custom-networking-enabled \
      )
      cat <<-EOF > /etc/profile.d/bootstrap.sh
        export CONTAINER_RUNTIME="containerd"
        export USE_MAX_PODS=false
        export KUBELET_EXTRA_ARGS="--max-pods=$${MAX_PODS}"
      EOF
      # Source extra environment variables in bootstrap script
      sed -i '/^set -o errexit/a\\nsource /etc/profile.d/bootstrap.sh' /etc/eks/bootstrap.sh
      sed -i 's/KUBELET_EXTRA_ARGS=$2/KUBELET_EXTRA_ARGS="$2 $KUBELET_EXTRA_ARGS"/' /etc/eks/bootstrap.sh
      EOT

      block_device_mappings = {
        xvda = {
          device_name = "/dev/xvda"
          ebs = {
            volume_size           = coalesce(node_group["disk_size"], 25)
            volume_type           = coalesce(node_group["disk_type"], "gp3")
            iops                  = 3000
            throughput            = 150
            encrypted             = true
            kms_key_id            = aws_kms_key.ebs.arn
            delete_on_termination = true
          }
        }
      }
      force_update_version = true

      instance_types = node_group["instance_type"]

      ebs_optimized           = true
      disable_api_termination = false
      enable_monitoring       = true
      metadata_options = {
        http_endpoint               = "enabled"
        http_tokens                 = "required"
        http_put_response_hop_limit = 2
        instance_metadata_tags      = "disabled"
      }
      create_iam_role = true
      iam_role_additional_policies = {
        ssm_managed_instance = "arn:${data.aws_partition.current.partition}:iam::aws:policy/AmazonSSMManagedInstanceCore"
      }
      iam_role_name            = format("%s-%s", node_group["name"], "iam-role")
      iam_role_use_name_prefix = true
      iam_role_description     = format("%s%s-%s", "IAM role for ", node_group["name"], "nodegroup")
      iam_role_tags = merge(
        {
          Name = format("%s-%s", local.name, "nodegroup-role")
        },
        var.tags
      )

      labels = merge({
        role      = "workers"
        nodegroup = "eks-managed"
        },
      var.tags)
      tags = merge({
        role      = "workers"
        nodegroup = "eks-managed"
        },
      var.tags)
      update_config = {
        max_unavailable_percentage = 20
      }
    }

ENI Configuration:

################################################################################
# VPC-CNI Custom Networking ENIConfig
################################################################################
resource "kubectl_manifest" "eni_config" {
  count = length(aws_subnet.secondary[*])

  yaml_body = yamlencode({
    apiVersion = "crd.k8s.amazonaws.com/v1alpha1"
    kind       = "ENIConfig"
    metadata   = {
      name = aws_subnet.secondary[count.index].availability_zone
    }
    spec = {
      securityGroups = [
        module.eks.node_security_group_id,
      ]
      subnet = aws_subnet.secondary[count.index].id
    }
  })
  depends_on = [
    module.eks
  ]
}

Expected behaviour

The expected behavior is that VPC CNI should pick up the values before the node group is creating ec2 machines and push the cni configuration of prefix delegation to the network configuration of the ec2 machines.

Actual behaviour

The actual behavior is that I need to recreate the ec2 machine for picking up the prefix delegation change which is not good for the production clusters.

Terminal Output Screenshot(s)

Additional context

If you require, I can provide screenshots of the ec2 machines' network configuration.

NOTE: Since we are using EKS for our production clusters, it will be really helpful if someone looks into this on priority and provide the right way of implementing the custom CNI networking for the `max_pods`.

cc - @bryantbiggs

arunsisodiya commented 1 year ago

Guys, Can anyone look into this?

bryantbiggs commented 1 year ago

Are you saying that this example does not produce the desired results? https://github.com/aws-ia/terraform-aws-eks-blueprints/tree/main/examples/ipv4-prefix-delegation

arunsisodiya commented 1 year ago

Yes right @bryantbiggs , I am able to use the VPC cni addon with secondary CIDR configuration but the prefix delegation was not enabled on the ec2 machines even if the configuration is there.

I don't know about the root cause but you can see that VPC-CNI addon, as well as managed node groups, are being created in parallel which should not happen right?

bryantbiggs commented 1 year ago

I have tried the example I linked 3 times and the results are as intended - I suspect the issue is with the configuration you are using; I would start with the example linked and modified to suit your needs

arunsisodiya commented 1 year ago

I agree with you that is why I put my whole configuration here, I think the same that it should work. I even validated all the settings and created a new cluster multiple times but the result was not desired.

I can give it another try now and let you know.

bryantbiggs commented 1 year ago

based on your configuration, I think you're trying to do too much. Again, start with the example linked and only modify the settings that you need to set explicitly

arunsisodiya commented 1 year ago

Do you think that the order of Environment variables can be an issue? Since now I am creating the cluster and it is showing me desired results, the only change i did is this:

configuration_values = jsonencode({
        env = {
          # Reference https://aws.github.io/aws-eks-best-practices/reliability/docs/networkmanagement/#cni-custom-networking
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"

          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })

Previously it was:

configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true"
          WARM_ENI_TARGET                    = "1"
          ENABLE_PREFIX_DELEGATION           = "true"
          ENI_CONFIG_LABEL_DEF               = "topology.kubernetes.io/zone"
        }
      })

bryantbiggs commented 1 year ago

closed with above guidance - thank you!

flaviomoringa commented 1 year ago

Hello,

I'm having the same issue, sometimes a cluster creation is OK, sometimes it's not. I do remember that some time a go when creating the cluster, the VPC_cni config as iutput all together as a block, and only the would the managed nodegroup output started, but now it seems both are done at the same time, and that causes this issue.

Was something changed that might have caused this? I now have to always check the nodes created to see if they have the secondary interface, and rotate them when they don't, which is really frustrating.

@bryantbiggs so I really think this issue should not be closed

bryantbiggs commented 1 year ago

yes, this was changed recently https://github.com/terraform-aws-modules/terraform-aws-eks/pull/2478

arunsisodiya commented 1 year ago

@bryantbiggs - This rolled-out change is quite old, we are already using this version. Still, sometimes everything will go as per plan with the right configuration but sometimes it is not working and we need to rotate nodes manually.

bryantbiggs commented 1 year ago

have you tried extending the dataplane_wait_duration to something like "60s" or more?

arunsisodiya commented 1 year ago

No, I have not tried it, Can give a try to this. Still, do we have any way to put explicit dependency between node groups and vpc_cni? like node groups must be created only after CNI is created?

lavishkotharinda commented 11 months ago

@ArunSisodiya , The problem in this code was not due to it's order sequence but the change in parameters , As per to the updated AWS doc (link:- https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html ) two parameters have been depreciated i.e. ( AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG = "true" and ENI_CONFIG_LABEL_DEF = "topology.kubernetes.io/zone" )

Try Instead the below code, as it worked for me with the same expected results

configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION           = "true"
          WARM_ENI_TARGET                    = "1"
        }
      })

After this you can easily find out the max number of Pods that can be placed on your eks Nodes with this command.

kubectl describe node ip-192-168-30-193.region-code.compute.internal | grep 'pods\|PrivateIPv4Address'

NOTE: - Replace 192.168.30.193 with the IPv4 address in the name of one of your nodes returned in the previous output.

aws-ia / terraform-aws-eks-blueprints