aws-ia / terraform-aws-eks-blueprints-addons

Terraform module which provisions addons on Amazon EKS clusters
https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/
Apache License 2.0
272 stars 127 forks source link

[EBS CSI Driver] It is not compatible with Windows Managed Node Group #390

Closed carlosrodlop closed 5 months ago

carlosrodlop commented 7 months ago

Description

Please provide a clear and concise description of the issue you are encountering, and a reproduction of your configuration (see the examples/* directory for references that you can copy+paste and tailor to match your configs if you are unable to copy your exact configuration). The reproduction MUST be executable by running terraform init && terraform apply without any further changes.

If your request is for a new feature, please use the Feature request template.

⚠️ Note

Before you submit an issue, please perform the following first:

  1. Remove the local .terraform directory (! ONLY if state is stored remotely, which hopefully you are following that best practice!): rm -rf .terraform/
  2. Re-initialize the project root to pull down modules: terraform init
  3. Re-attempt your terraform plan or apply and check if the issue still persists

Versions

Reproduction Code [Required]

Considerations:

Steps to reproduce the behavior:

main.tf


data "aws_availability_zones" "available" {}

locals {
  name   = "ebs-winmng" 
  region = "us-east-1"

  vpc_name             = "${local.name}-vpc"
  cluster_name         = "${local.name}-eks"

  vpc_cidr = "10.0.0.0/16"

  cluster_version = "1.28"

  azs = slice(data.aws_availability_zones.available.names, 0, 2)

  #https://docs.aws.amazon.com/eks/latest/userguide/choosing-instance-type.html
  k8s_instance_types = {
    "graviton3" = ["m7g.xlarge"]
  }

  tags = {
    "tf-blueprint"  = local.name
  }

}

################################################################################
# EKS: Add-ons
################################################################################

# EKS Blueprints Add-ons

module "ebs_csi_driver_irsa" {
  source  = "terraform-aws-modules/iam/aws//modules/iam-role-for-service-accounts-eks"
  version = "5.29.0"

  role_name_prefix = "${module.eks.cluster_name}-ebs-csi-driv"

  attach_ebs_csi_policy = true

  oidc_providers = {
    main = {
      provider_arn               = module.eks.oidc_provider_arn
      namespace_service_accounts = ["kube-system:ebs-csi-controller-sa"]
    }
  }

  tags = local.tags
}

module "eks_blueprints_addons" {
  source = "aws-ia/eks-blueprints-addons/aws"
  version = "1.15.1"

  cluster_name      = module.eks.cluster_name
  cluster_endpoint  = module.eks.cluster_endpoint
  oidc_provider_arn = module.eks.oidc_provider_arn
  cluster_version   = module.eks.cluster_version

  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
    coredns    = {}
    vpc-cni    = {}
    kube-proxy = {}
  }

  tags = local.tags
}

################################################################################
# EKS: Infra
################################################################################

module "eks" {
  source = "terraform-aws-modules/eks/aws"
  version = "19.17.1"

  cluster_name                   = local.cluster_name
  cluster_endpoint_public_access = true
  cluster_version = local.cluster_version

  vpc_id     = module.vpc.vpc_id
  subnet_ids = module.vpc.private_subnets

  node_security_group_additional_rules = {

    egress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "egress"
      self        = true
    }

    ingress_self_all = {
      description = "Node to node all ports/protocols"
      protocol    = "-1"
      from_port   = 0
      to_port     = 0
      type        = "ingress"
      self        = true
    }

    egress_ssh_all = {
      description      = "Egress all ssh to internet for github"
      protocol         = "tcp"
      from_port        = 22
      to_port          = 22
      type             = "egress"
      cidr_blocks      = ["0.0.0.0/0"]
      ipv6_cidr_blocks = ["::/0"]
    }

    ingress_cluster_to_node_all_traffic = {
      description                   = "Cluster API to Nodegroup all traffic"
      protocol                      = "-1"
      from_port                     = 0
      to_port                       = 0
      type                          = "ingress"
      source_cluster_security_group = true
    }
  }

  eks_managed_node_groups = {
    mg_linux = {
      node_group_name = "managed-linux"
      instance_types  = local.k8s_instance_types["graviton3"]
      ami_type        = "AL2_ARM_64"
      capacity_type   = "ON_DEMAND"
      disk_size       = 25
      desired_size    = 2
    }
    mg_windows = {
      min_size          = 1
      desired_size      = 1
      max_size          = 5
      platform          = "windows"
      ami_type          = "WINDOWS_CORE_2019_x86_64"
      capacity_type     = "SPOT"
      enable_monitoring = true
      disk_size         = "100"
      use_name_prefix   = true
      cluster_version   = local.cluster_version
      instance_types    = ["m5d.xlarge", "m5ad.xlarge"]
      taints = [
        {
          key    = "os"
          value  = "windows"
          effect = "NO_SCHEDULE"
        }
      ]
    }
  }

  create_cloudwatch_log_group = false

  create_kms_key  = true
  kms_key_aliases = ["eks/${local.name}"]

  tags = local.tags
}

################################################################################
# Supported Resources
################################################################################

module "vpc" {
  source  = "terraform-aws-modules/vpc/aws"
  version = "5.5.2"

  name = local.vpc_name
  cidr = local.vpc_cidr

  azs             = local.azs
  public_subnets  = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 4, k)]
  private_subnets = [for k, v in local.azs : cidrsubnet(local.vpc_cidr, 8, k + 48)]

  enable_nat_gateway = true
  single_nat_gateway = true

  #https://docs.aws.amazon.com/eks/latest/userguide/network_reqs.html
  #https://docs.aws.amazon.com/eks/latest/userguide/network-load-balancing.html
  public_subnet_tags = {
    "kubernetes.io/role/elb" = 1
  }

  private_subnet_tags = {
    "kubernetes.io/role/internal-elb" = 1
  }

  tags = local.tags

}

provider.tf

terraform {
  required_version = ">= 1.0.0"

  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = ">= 3.72"
    }
    kubernetes = {
      source  = "hashicorp/kubernetes"
      version = ">= 2.10"
    }
    helm = {
      source  = "hashicorp/helm"
      version = ">= 2.5.1"
    }
  }

}

provider "aws" {
  region = local.region
}

provider "kubernetes" {
  host                   = module.eks.cluster_endpoint
  cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

  exec {
    api_version = "client.authentication.k8s.io/v1beta1"
    command     = "aws"
    # This requires the awscli to be installed locally where Terraform is executed
    args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
  }
}

provider "helm" {
  kubernetes {
    host                   = module.eks.cluster_endpoint
    cluster_ca_certificate = base64decode(module.eks.cluster_certificate_authority_data)

    exec {
      api_version = "client.authentication.k8s.io/v1beta1"
      command     = "aws"
      # This requires the awscli to be installed locally where Terraform is executed
      args = ["eks", "get-token", "--cluster-name", module.eks.cluster_name]
    }
  }
}

Expected behaviour

EBS CSI Driver is deployed correctly

Actual behaviour

EBS CSI Driver is NOT deployed

Terminal Output Screenshot(s)

module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0]: Still creating... [9m30s elapsed]
module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0]: Still creating... [9m40s elapsed]
module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0]: Creation complete after 9m45s [id=ebs-winmng-eks:mg_windows-20240419143200940700000016]

Apply complete! Resources: 41 added, 0 changed, 0 destroyed.

...

Terraform detected the following changes made outside of Terraform since the last "terraform apply" which may have affected this plan:

  # module.eks.module.eks_managed_node_group["mg_linux"].aws_eks_node_group.this[0] has changed
  ~ resource "aws_eks_node_group" "this" {
        id                     = "ebs-winmng-eks:mg_linux-20240419143200938700000014"
      + labels                 = {}
        tags                   = {
            "Name"         = "mg_linux"
            "tf-blueprint" = "ebs-winmng"
        }
        # (15 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

  # module.eks.module.eks_managed_node_group["mg_windows"].aws_eks_node_group.this[0] has changed
  ~ resource "aws_eks_node_group" "this" {
        id                     = "ebs-winmng-eks:mg_windows-20240419143200940700000016"
      + labels                 = {}
        tags                   = {
            "Name"         = "mg_windows"
            "tf-blueprint" = "ebs-winmng"
        }
        # (15 unchanged attributes hidden)

        # (5 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the relevant attributes using ignore_changes, the following plan may include actions to undo or respond to these
changes.

──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.ebs_csi_driver_irsa.aws_iam_policy.ebs_csi[0] will be created
  + resource "aws_iam_policy" "ebs_csi" {
      + arn              = (known after apply)
      + attachment_count = (known after apply)
      + description      = "Provides permissions to manage EBS volumes via the container storage interface driver"
      + id               = (known after apply)
      + name             = (known after apply)
      + name_prefix      = "AmazonEKS_EBS_CSI_Policy-"
      + path             = "/"
      + policy           = jsonencode(
            {
              + Statement = [
                  + {
                      + Action   = [
                          + "ec2:ModifyVolume",
                          + "ec2:DetachVolume",
                          + "ec2:DescribeVolumesModifications",
                          + "ec2:DescribeVolumes",
                          + "ec2:DescribeTags",
                          + "ec2:DescribeSnapshots",
                          + "ec2:DescribeInstances",
                          + "ec2:DescribeAvailabilityZones",
                          + "ec2:CreateSnapshot",
                          + "ec2:AttachVolume",
                        ]
                      + Effect   = "Allow"
                      + Resource = "*"
                    },
                  + {
                      + Action    = "ec2:CreateTags"
                      + Condition = {
                          + StringEquals = {
                              + "ec2:CreateAction" = [
                                  + "CreateVolume",
                                  + "CreateSnapshot",
                                ]
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = [
                          + "arn:aws:ec2:*:*:volume/*",
                          + "arn:aws:ec2:*:*:snapshot/*",
                        ]
                    },
                  + {
                      + Action   = "ec2:DeleteTags"
                      + Effect   = "Allow"
                      + Resource = [
                          + "arn:aws:ec2:*:*:volume/*",
                          + "arn:aws:ec2:*:*:snapshot/*",
                        ]
                    },
                  + {
                      + Action    = "ec2:CreateVolume"
                      + Condition = {
                          + StringLike = {
                              + "aws:RequestTag/ebs.csi.aws.com/cluster" = "true"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:CreateVolume"
                      + Condition = {
                          + StringLike = {
                              + "aws:RequestTag/CSIVolumeName" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:CreateVolume"
                      + Condition = {
                          + StringLike = {
                              + "aws:RequestTag/kubernetes.io/cluster/*" = "owned"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/ebs.csi.aws.com/cluster" = "true"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/CSIVolumeName" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/kubernetes.io/cluster/*" = "owned"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteVolume"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/kubernetes.io/created-for/pvc/name" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteSnapshot"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/CSIVolumeSnapshotName" = "*"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                  + {
                      + Action    = "ec2:DeleteSnapshot"
                      + Condition = {
                          + StringLike = {
                              + "ec2:ResourceTag/ebs.csi.aws.com/cluster" = "true"
                            }
                        }
                      + Effect    = "Allow"
                      + Resource  = "*"
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + policy_id        = (known after apply)
      + tags             = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all         = {
          + "tf-blueprint" = "ebs-winmng"
        }
    }

  # module.ebs_csi_driver_irsa.aws_iam_role.this[0] will be created
  + resource "aws_iam_role" "this" {
      + arn                   = (known after apply)
      + assume_role_policy    = jsonencode(
            {
              + Statement = [
                  + {
                      + Action    = "sts:AssumeRoleWithWebIdentity"
                      + Condition = {
                          + StringEquals = {
                              + "oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67:aud" = "sts.amazonaws.com"
                              + "oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67:sub" = "system:serviceaccount:kube-system:ebs-csi-controller-sa"
                            }
                        }
                      + Effect    = "Allow"
                      + Principal = {
                          + Federated = "arn:aws:iam::324005994172:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67"
                        }
                    },
                ]
              + Version   = "2012-10-17"
            }
        )
      + create_date           = (known after apply)
      + force_detach_policies = true
      + id                    = (known after apply)
      + managed_policy_arns   = (known after apply)
      + max_session_duration  = 3600
      + name                  = (known after apply)
      + name_prefix           = "ebs-winmng-eks-ebs-csi-driv"
      + path                  = "/"
      + tags                  = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all              = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + unique_id             = (known after apply)
    }

  # module.ebs_csi_driver_irsa.aws_iam_role_policy_attachment.ebs_csi[0] will be created
  + resource "aws_iam_role_policy_attachment" "ebs_csi" {
      + id         = (known after apply)
      + policy_arn = (known after apply)
      + role       = (known after apply)
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "aws-ebs-csi-driver"
      + addon_version               = "v1.29.1-eksbuild.1"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + service_account_role_arn    = (known after apply)
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["coredns"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "coredns"
      + addon_version               = "v1.10.1-eksbuild.7"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "kube-proxy"
      + addon_version               = "v1.28.8-eksbuild.2"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"] will be created
  + resource "aws_eks_addon" "this" {
      + addon_name                  = "vpc-cni"
      + addon_version               = "v1.18.0-eksbuild.1"
      + arn                         = (known after apply)
      + cluster_name                = "ebs-winmng-eks"
      + configuration_values        = (known after apply)
      + created_at                  = (known after apply)
      + id                          = (known after apply)
      + modified_at                 = (known after apply)
      + preserve                    = true
      + resolve_conflicts_on_create = "OVERWRITE"
      + resolve_conflicts_on_update = "OVERWRITE"
      + tags                        = {
          + "tf-blueprint" = "ebs-winmng"
        }
      + tags_all                    = {
          + "tf-blueprint" = "ebs-winmng"
        }

      + timeouts {}
    }

  # module.eks_blueprints_addons.time_sleep.this will be created
  + resource "time_sleep" "this" {
      + create_duration = "30s"
      + id              = (known after apply)
      + triggers        = {
          + "cluster_endpoint"  = "https://54034D8D87A2E92EFA859752FD5BEC67.yl4.us-east-1.eks.amazonaws.com"
          + "cluster_name"      = "ebs-winmng-eks"
          + "custom"            = ""
          + "oidc_provider_arn" = "arn:aws:iam::324005994172:oidc-provider/oidc.eks.us-east-1.amazonaws.com/id/54034D8D87A2E92EFA859752FD5BEC67"
        }
    }

Plan: 8 to add, 0 to change, 0 to destroy.
module.ebs_csi_driver_irsa.aws_iam_policy.ebs_csi[0]: Creating...
module.eks_blueprints_addons.time_sleep.this: Creating...
module.ebs_csi_driver_irsa.aws_iam_role.this[0]: Creating...
module.ebs_csi_driver_irsa.aws_iam_policy.ebs_csi[0]: Creation complete after 1s [id=arn:aws:iam::324005994172:policy/AmazonEKS_EBS_CSI_Policy-20240419144250170500000001]
module.ebs_csi_driver_irsa.aws_iam_role.this[0]: Creation complete after 1s [id=ebs-winmng-eks-ebs-csi-driv20240419144250220200000002]
module.ebs_csi_driver_irsa.aws_iam_role_policy_attachment.ebs_csi[0]: Creating...
module.ebs_csi_driver_irsa.aws_iam_role_policy_attachment.ebs_csi[0]: Creation complete after 0s [id=ebs-winmng-eks-ebs-csi-driv20240419144250220200000002-20240419144251773200000003]
module.eks_blueprints_addons.time_sleep.this: Still creating... [10s elapsed]
module.eks_blueprints_addons.time_sleep.this: Still creating... [20s elapsed]
module.eks_blueprints_addons.time_sleep.this: Still creating... [30s elapsed]
module.eks_blueprints_addons.time_sleep.this: Creation complete after 30s [id=2024-04-19T14:43:20Z]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["coredns"]: Creating...
module.eks_blueprints_addons.aws_eks_addon.this["coredns"]: Creation complete after 9s [id=ebs-winmng-eks:coredns]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Still creating... [10s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Still creating... [10s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [10s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Still creating... [20s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Still creating... [20s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [20s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [30s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Still creating... [30s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Still creating... [30s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]: Creation complete after 36s [id=ebs-winmng-eks:kube-proxy]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]: Creation complete after 36s [id=ebs-winmng-eks:vpc-cni]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [40s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [50s elapsed]
...
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [19m51s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [20m1s elapsed]
...
│ Error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
│
│   with module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"],
│   on .terraform/modules/eks_blueprints_addons/main.tf line 2178, in resource "aws_eks_addon" "this":
│ 2178: resource "aws_eks_addon" "this" {
│
╵

Additional context

$ cat terraform.log | grep ERROR | grep -i ebs
2024-04-19T13:08:41.191Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_detail="" tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=ac52bf2a-f067-43af-0cc2-6470b8a7aeba @caller=github.com/hashicorp/terraform-plugin-go@v0.22.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_severity=ERROR diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 403, RequestID: 911cb32e-aa9a-4a9a-97b8-84c0ee3f54a2, api error InvalidSignatureException: Signature expired: 20240419T130840Z is now earlier than 20240419T131027Z (20240419T131527Z - 5 min.)" tf_proto_version=5.4 tf_rpc=ApplyResourceChange tf_resource_type=aws_eks_addon timestamp=2024-04-19T13:08:41.190Z
2024-04-19T13:08:41.272Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 403, RequestID: 911cb32e-aa9a-4a9a-97b8-84c0ee3f54a2, api error InvalidSignatureException: Signature expired: 20240419T130840Z is now earlier than 20240419T131027Z (20240419T131527Z - 5 min.)
2024-04-19T13:29:16.090Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: tf_resource_type=aws_eks_addon tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=30b48a7e-2336-5dcb-bf9a-823b22f8edcc diagnostic_severity=ERROR tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/terraform-plugin-go@v0.22.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_detail="" diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 0, RequestID: , request send failed, Get \"https://eks.us-east-1.amazonaws.com/clusters/ebs-winmng-eks/addons/aws-ebs-csi-driver\": dial tcp: lookup eks.us-east-1.amazonaws.com on 192.168.65.7:53: no such host" tf_proto_version=5.4 @module=sdk.proto timestamp=2024-04-19T13:29:16.090Z
2024-04-19T13:29:16.155Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: operation error EKS: DescribeAddon, https response error StatusCode: 0, RequestID: , request send failed, Get "https://eks.us-east-1.amazonaws.com/clusters/ebs-winmng-eks/addons/aws-ebs-csi-driver": dial tcp: lookup eks.us-east-1.amazonaws.com on 192.168.65.7:53: no such host
2024-04-19T13:35:53.312Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled" tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=b38029c7-0c06-bae0-2f92-6a5f6a79189c tf_rpc=ApplyResourceChange diagnostic_detail="" diagnostic_severity=ERROR tf_proto_version=5.4 tf_resource_type=aws_eks_addon @caller=github.com/hashicorp/terraform-plugin-go@v0.22.1/tfprotov5/internal/diag/diagnostics.go:58 timestamp=2024-04-19T13:35:53.311Z
2024-04-19T13:35:53.390Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled
2024-04-19T13:35:53.391Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: execution halted
2024-04-19T13:35:53.391Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: execution halted
2024-04-19T15:03:21.067Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)" diagnostic_detail="" @caller=github.com/hashicorp/terraform-plugin-go@v0.22.1/tfprotov5/internal/diag/diagnostics.go:58 diagnostic_severity=ERROR tf_provider_addr=registry.terraform.io/hashicorp/aws tf_req_id=98dcfc9c-853e-c14c-e8e7-0cfa2626007b tf_resource_type=aws_eks_addon tf_proto_version=5.4 tf_rpc=ApplyResourceChange timestamp=2024-04-19T15:03:21.065Z
2024-04-19T15:03:21.157Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'CREATING', timeout: 20m0s)
2024-04-19T15:21:15.005Z [ERROR] provider.terraform-provider-aws_v5.46.0_x5: Response contains error diagnostic: @module=sdk.proto tf_provider_addr=registry.terraform.io/hashicorp/aws tf_rpc=ApplyResourceChange tf_resource_type=aws_eks_addon diagnostic_detail="" diagnostic_summary="waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled" tf_proto_version=5.4 diagnostic_severity=ERROR tf_req_id=bdd6e006-8e8a-6107-f182-3675cd7bd1f4 @caller=github.com/hashicorp/terraform-plugin-go@v0.22.1/tfprotov5/internal/diag/diagnostics.go:58 timestamp=2024-04-19T15:21:15.004Z
2024-04-19T15:21:15.087Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: waiting for EKS Add-On (ebs-winmng-eks:aws-ebs-csi-driver) create: context canceled
2024-04-19T15:21:15.088Z [ERROR] vertex "module.eks_blueprints_addons.aws_eks_addon.this[\"aws-ebs-csi-driver\"]" error: execution halted
github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

carlosrodlop commented 6 months ago

Has anyone looked into the provided example? This is issue still not answered either triaged to be removed.

bryantbiggs commented 6 months ago

and what do the logs from the EBS CSI driver pod show you?

carlosrodlop commented 5 months ago

I will shortly in the net couple of days. Thanks for looking into this @bryantbiggs

carlosrodlop commented 5 months ago

@bryantbiggs thanks for your patience :)

Regarding ebs csi driver logs, they are not for ebs-csi-node-windows because they are in a PENDING state (ContainerCreating). The following Kubernetes event is connected to this issue:

(combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a0fa2ff62abf7f2fb4b5b2ab7d9db59e11ba913c7b89ed0458cc650abb74701c": plugin type="vpc-bridge" name="vpc" failed (add): failed to parse Kubernetes args: failed to get pod IP address ebs-csi-node-windows-fzp2l: error executing k8s connector: error executing connector binary: exit status 1 with execution error: pod ebs-csi-node-windows-fzp2l does not have label vpc.amazonaws.com/PrivateIPv4Address

From the above description we can say that the issue appears to be related to the Amazon VPC CNI plugin failing to obtain the private IPv4 address for the Windows pod running the EBS CSI driver.

Questions

1.- I spoke to @wellsiau-aws about this issue and he pointed me out to this list of prerequisites https://github.com/kubernetes-sigs/aws-ebs-csi-driver/tree/master/examples/kubernetes/windows. Looking at the 4 points, I have my doubts on point 2 and 3. Do I need to add them in the provider main.tf somehow?

2.- Looking at https://docs.aws.amazon.com/eks/latest/userguide/ebs-csi.html. I am wondering if https://docs.aws.amazon.com/eks/latest/userguide/csi-iam-role.html is configured correctly by the current configuration or is there something else we need to add for Windows Managed Nodes.

module "eks_blueprints_addons" {
  source = "aws-ia/eks-blueprints-addons/aws"
  version = "1.15.1"

...
  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
...
  }

...
}

3.- I tried to look at the terraform code here https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/main/main.tf to understand what is happening under the scenes but there is not reference to ebs csi driver. Where should we look at the code for troubleshooting?

4.- Has anyone tried to run the *.tf files I provided? The issue is easy to reproduce locally I believe.

Resources status

Finally, I'm attaching a snapshot of all resource created and status

kubectl get all -A 
NAMESPACE     NAME                                     READY   STATUS              RESTARTS   AGE
kube-system   pod/aws-node-mvxg7                       2/2     Running             0          64m
kube-system   pod/aws-node-t74hb                       2/2     Running             0          64m
kube-system   pod/coredns-6777b4b9b9-jh6cf             1/1     Running             0          65m
kube-system   pod/coredns-6777b4b9b9-kvbmn             1/1     Running             0          65m
kube-system   pod/ebs-csi-controller-66cb49498-n92w2   6/6     Running             0          65m
kube-system   pod/ebs-csi-controller-66cb49498-ph2rd   6/6     Running             0          65m
kube-system   pod/ebs-csi-node-rhmfq                   3/3     Running             0          64m
kube-system   pod/ebs-csi-node-windows-fzp2l           0/3     ContainerCreating   0          57m
kube-system   pod/ebs-csi-node-xhjzv                   3/3     Running             0          64m
kube-system   pod/kube-proxy-69njv                     1/1     Running             0          64m
kube-system   pod/kube-proxy-ghpmf                     1/1     Running             0          64m

NAMESPACE     NAME                 TYPE        CLUSTER-IP    EXTERNAL-IP   PORT(S)                  AGE
default       service/kubernetes   ClusterIP   172.20.0.1    <none>        443/TCP                  70m
kube-system   service/kube-dns     ClusterIP   172.20.0.10   <none>        53/UDP,53/TCP,9153/TCP   68m

NAMESPACE     NAME                                  DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR              AGE
kube-system   daemonset.apps/aws-node               2         2         2       2            2           <none>                     68m
kube-system   daemonset.apps/ebs-csi-node           2         2         2       2            2           kubernetes.io/os=linux     65m
kube-system   daemonset.apps/ebs-csi-node-windows   1         1         0       1            0           kubernetes.io/os=windows   65m
kube-system   daemonset.apps/kube-proxy             2         2         2       2            2           <none>                     68m

NAMESPACE     NAME                                 READY   UP-TO-DATE   AVAILABLE   AGE
kube-system   deployment.apps/coredns              2/2     2            2           68m
kube-system   deployment.apps/ebs-csi-controller   2/2     2            2           65m

NAMESPACE     NAME                                           DESIRED   CURRENT   READY   AGE
kube-system   replicaset.apps/coredns-6777b4b9b9             2         2         2       65m
kube-system   replicaset.apps/coredns-86969bccb4             0         0         0       68m
kube-system   replicaset.apps/ebs-csi-controller-66cb49498   2         2         2       m
bryantbiggs commented 5 months ago

I would re-visit your configurations, theres a number of mis-configurations. For example:

carlosrodlop commented 5 months ago

Thanks @bryantbiggs for your reply

You are setting a taint on the windows nodes - is there a toleration that matches on the EBS CSI driver?

Nope! Where can I find the accepted inputs for eks_addons > ebs_driver. Ideally, I'd like to pass values with a yaml with the tolerations.

There is not reference to them either https://registry.terraform.io/modules/aws-ia/eks-blueprints-addon/aws/latest?tab=inputs neither https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/

OK, I guess I can do something like this https://github.com/aws-ia/terraform-aws-eks-blueprints/blob/3e9e5a13e7afee42d4b64874ba5adf73f329ff30/patterns/karpenter/main.tf#L117

Then adding tolerations like https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L276-L281

Can you confirm my suggestion please?

I don't see where you have set node.enableWindows = true per the docs

Which docs please?

Gotcha I need to enable this section https://github.com/kubernetes-sigs/aws-ebs-csi-driver/blob/master/charts/aws-ebs-csi-driver/values.yaml#L384 using the same approach I explained above

carlosrodlop commented 5 months ago

I'm closing this issue it was solved by using node selectors only for Node Pools I want to use EBS CSI driver

module "eks_blueprints_addons" {
  source = "aws-ia/eks-blueprints-addons/aws"
  #vEKSBpAddonsTFMod#
  version = "1.15.1"
 ...
  eks_addons = {
    aws-ebs-csi-driver = {
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
      configuration_values = jsonencode(
        {
          node = {
            nodeSelector = {
              ebs_driver = "enabled"
            }
          }
        }
      )
    }
...
}