addons configuration is timing out

vstthomas commented 10 months ago

[x] ✋ I have searched the open/closed issues and my issue is not listed.

Not exactly sure what I'm running into here (a timeout, or what?!) but this behavior just started today.

Someone suggested passing the -parallelism=1 parameter to Terraform (a few months ago) but I removed it last week; it caused very slow builds.

apply:  ## Build Terraform project with output log                                 
        terraform apply --auto-approve -no-color -parallelism=1
                -input=false $(filePlan) \                                         
                2>&1 | tee /tmp/tf-$(TF_VAR_myProject)-apply.out

Now I'm seeing this behavior:

module.eks_blueprints_addons.aws_eks_addon.this["snapshot-controller"]: Still creating... [5m0s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]: Still creating... [5m0s elapsed]
module.eks_blueprints_addons.aws_eks_addon.this["coredns"]: Still creating... [5m0s elapsed]

Warning: Helm release "external-dns" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.

  with module.eks_blueprints_addons.module.external_dns.helm_release.this[0],
  on .terraform/modules/eks_blueprints_addons.external_dns/main.tf line 9, in resource "helm_release" "this":
   9: resource "helm_release" "this" {

Warning: Helm release "karpenter" was created but has a failed status. Use the `helm` command to investigate the error, correct it, then run Terraform again.

  with module.eks_blueprints_addons.module.karpenter.helm_release.this[0],
  on .terraform/modules/eks_blueprints_addons.karpenter/main.tf line 9, in resource "helm_release" "this":
   9: resource "helm_release" "this" {

Warning: Running terraform apply again will remove the kubernetes add-on and attempt to create it again effectively purging previous add-on configuration

  with module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"],
  on .terraform/modules/eks_blueprints_addons/main.tf line 2177, in resource "aws_eks_addon" "this":
2177: resource "aws_eks_addon" "this" {

(and 2 more similar warnings elsewhere)

Error: 1 error occurred:
    * Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"

  with module.eks_blueprints_addons.module.external_dns.helm_release.this[0],
  on .terraform/modules/eks_blueprints_addons.external_dns/main.tf line 9, in resource "helm_release" "this":
   9: resource "helm_release" "this" {

Error: 1 error occurred:
    * Internal error occurred: failed calling webhook "mservice.elbv2.k8s.aws": failed to call webhook: Post "https://aws-load-balancer-webhook-service.kube-system.svc:443/mutate-v1-service?timeout=10s": no endpoints available for service "aws-load-balancer-webhook-service"

  with module.eks_blueprints_addons.module.karpenter.helm_release.this[0],
  on .terraform/modules/eks_blueprints_addons.karpenter/main.tf line 9, in resource "helm_release" "this":
   9: resource "helm_release" "this" {

Error: waiting for EKS Add-On (gitops-demo-stage:aws-ebs-csi-driver) create: timeout while waiting for state to become 'ACTIVE' (last state: 'DEGRADED', timeout: 5m0s)

  with module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"],
  on .terraform/modules/eks_blueprints_addons/main.tf line 2177, in resource "aws_eks_addon" "this":
2177: resource "aws_eks_addon" "this" {

Error: waiting for EKS Add-On (gitops-demo-stage:snapshot-controller) create: timeout while waiting for state to become 'ACTIVE' (last state: 'DEGRADED', timeout: 5m0s)

  with module.eks_blueprints_addons.aws_eks_addon.this["snapshot-controller"],
  on .terraform/modules/eks_blueprints_addons/main.tf line 2177, in resource "aws_eks_addon" "this":
2177: resource "aws_eks_addon" "this" {

Error: waiting for EKS Add-On (gitops-demo-stage:coredns) create: timeout while waiting for state to become 'ACTIVE' (last state: 'DEGRADED', timeout: 5m0s)

  with module.eks_blueprints_addons.aws_eks_addon.this["coredns"],
  on .terraform/modules/eks_blueprints_addons/main.tf line 2177, in resource "aws_eks_addon" "this":
2177: resource "aws_eks_addon" "this" {

Not sure if the parallelism parameter is even part of this issue 🤷 but it could be a factor.

Partially Installed?

It seems like some of these were started but couldn't complete for some reason:

% tf state list | grep addons
module.eks_blueprints_addons.data.aws_caller_identity.current
module.eks_blueprints_addons.data.aws_eks_addon_version.this["aws-ebs-csi-driver"]
module.eks_blueprints_addons.data.aws_eks_addon_version.this["coredns"]
module.eks_blueprints_addons.data.aws_eks_addon_version.this["kube-proxy"]
module.eks_blueprints_addons.data.aws_eks_addon_version.this["snapshot-controller"]
module.eks_blueprints_addons.data.aws_eks_addon_version.this["vpc-cni"]
module.eks_blueprints_addons.data.aws_iam_policy_document.aws_fsx_csi_driver[0]
module.eks_blueprints_addons.data.aws_iam_policy_document.aws_load_balancer_controller[0]
module.eks_blueprints_addons.data.aws_iam_policy_document.external_dns[0]
module.eks_blueprints_addons.data.aws_iam_policy_document.karpenter[0]
module.eks_blueprints_addons.data.aws_iam_policy_document.karpenter_assume_role[0]
module.eks_blueprints_addons.data.aws_partition.current
module.eks_blueprints_addons.data.aws_region.current
module.eks_blueprints_addons.aws_cloudwatch_event_rule.karpenter["health_event"]
module.eks_blueprints_addons.aws_cloudwatch_event_rule.karpenter["instance_rebalance"]
module.eks_blueprints_addons.aws_cloudwatch_event_rule.karpenter["instance_state_change"]
module.eks_blueprints_addons.aws_cloudwatch_event_rule.karpenter["spot_interupt"]
module.eks_blueprints_addons.aws_cloudwatch_event_target.karpenter["health_event"]
module.eks_blueprints_addons.aws_cloudwatch_event_target.karpenter["instance_rebalance"]
module.eks_blueprints_addons.aws_cloudwatch_event_target.karpenter["instance_state_change"]
module.eks_blueprints_addons.aws_cloudwatch_event_target.karpenter["spot_interupt"]
module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"]
module.eks_blueprints_addons.aws_eks_addon.this["coredns"]
module.eks_blueprints_addons.aws_eks_addon.this["kube-proxy"]
module.eks_blueprints_addons.aws_eks_addon.this["snapshot-controller"]
module.eks_blueprints_addons.aws_eks_addon.this["vpc-cni"]
module.eks_blueprints_addons.aws_iam_instance_profile.karpenter[0]
module.eks_blueprints_addons.aws_iam_role.karpenter[0]
module.eks_blueprints_addons.aws_iam_role_policy_attachment.karpenter["AmazonEC2ContainerRegistryReadOnly"]
module.eks_blueprints_addons.aws_iam_role_policy_attachment.karpenter["AmazonEKSWorkerNodePolicy"]
module.eks_blueprints_addons.aws_iam_role_policy_attachment.karpenter["AmazonEKS_CNI_Policy"]
module.eks_blueprints_addons.time_sleep.this
module.eks_blueprints_addons.module.aws_fsx_csi_driver.data.aws_caller_identity.current[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.data.aws_iam_policy_document.assume[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.data.aws_iam_policy_document.this[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.data.aws_partition.current[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.aws_iam_policy.this[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.aws_iam_role.this[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.aws_iam_role_policy_attachment.this[0]
module.eks_blueprints_addons.module.aws_fsx_csi_driver.helm_release.this[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.data.aws_caller_identity.current[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.data.aws_iam_policy_document.assume[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.data.aws_iam_policy_document.this[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.data.aws_partition.current[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.aws_iam_policy.this[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.aws_iam_role.this[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.aws_iam_role_policy_attachment.this[0]
module.eks_blueprints_addons.module.aws_load_balancer_controller.helm_release.this[0]
module.eks_blueprints_addons.module.external_dns.data.aws_caller_identity.current[0]
module.eks_blueprints_addons.module.external_dns.data.aws_iam_policy_document.assume[0]
module.eks_blueprints_addons.module.external_dns.data.aws_iam_policy_document.this[0]
module.eks_blueprints_addons.module.external_dns.data.aws_partition.current[0]
module.eks_blueprints_addons.module.external_dns.aws_iam_policy.this[0]
module.eks_blueprints_addons.module.external_dns.aws_iam_role.this[0]
module.eks_blueprints_addons.module.external_dns.aws_iam_role_policy_attachment.this[0]
module.eks_blueprints_addons.module.external_dns.helm_release.this[0]
module.eks_blueprints_addons.module.karpenter.data.aws_caller_identity.current[0]
module.eks_blueprints_addons.module.karpenter.data.aws_iam_policy_document.assume[0]
module.eks_blueprints_addons.module.karpenter.data.aws_iam_policy_document.this[0]
module.eks_blueprints_addons.module.karpenter.data.aws_partition.current[0]
module.eks_blueprints_addons.module.karpenter.aws_iam_policy.this[0]
module.eks_blueprints_addons.module.karpenter.aws_iam_role.this[0]
module.eks_blueprints_addons.module.karpenter.aws_iam_role_policy_attachment.this[0]
module.eks_blueprints_addons.module.karpenter.helm_release.this[0]
module.eks_blueprints_addons.module.karpenter_sqs.data.aws_iam_policy_document.this[0]
module.eks_blueprints_addons.module.karpenter_sqs.aws_sqs_queue.this[0]
module.eks_blueprints_addons.module.karpenter_sqs.aws_sqs_queue_policy.this[0]
module.eks_blueprints_addons.module.metrics_server.helm_release.this[0]
module.eks_blueprints_addons.module.secrets_store_csi_driver.helm_release.this[0]
module.eks_blueprints_addons.module.secrets_store_csi_driver_provider_aws.helm_release.this[0]

The Config file

########################################################################################################################
# EKS Addons
# https://github.com/aws-ia/terraform-aws-eks-blueprints-addons#usage
# https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/99520ae0125df7b24163e14cf4eba2c96fcf14bd/docs/amazon-eks-addons.md#configuration-values
########################################################################################################################
module "eks_blueprints_addons" {
  source  = "aws-ia/eks-blueprints-addons/aws"
  version = "~> 1.12.0" #ensure to update this to the latest/desired version

  eks_addons_timeouts = {
    create = "5m"
    update = "1m"
    delete = "1m"
  }

  # Any addon from this page can be added to the below block
  # https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html#workloads-add-ons-available-eks
  eks_addons = {
    aws-ebs-csi-driver = {
      most_recent              = true
      service_account_role_arn = module.ebs_csi_driver_irsa.iam_role_arn
    }
    coredns = {
      most_recent = true
    }
    kube-proxy = {
      most_recent = true
    }
    vpc-cni = {
      most_recent              = true
      service_account_role_arn = module.vpc_cni_irsa.iam_role_arn
    }
    snapshot-controller = {
      most_recent = true
    }
    #aws-mountpoint-s3-csi-driver = {
    #  most_recent = true
    #}
  }

  # --------------------------------------------------------------------------------------------------------------------
  # Auto-Scaling
  # karpenter: https://aws-ia.github.io/terraform-aws-eks-blueprints/patterns/karpenter/
  #
  # --------------------------------------------------------------------------------------------------------------------
  enable_karpenter = true
  #enable_cluster_proportional_autoscaler = false # horizontal "pod" autoscaler

  # --------------------------------------------------------------------------------------------------------------------
  # Observability Support
  # --------------------------------------------------------------------------------------------------------------------
  enable_metrics_server = true

  # --------------------------------------------------------------------------------------------------------------------
  # Storage: Secrets and Volumes
  # --------------------------------------------------------------------------------------------------------------------
  # Adding support for Lustre Volumes
  enable_aws_fsx_csi_driver = true
  # Adding support for Kubernetes Secrets Management
  enable_secrets_store_csi_driver              = true
  enable_secrets_store_csi_driver_provider_aws = true # see docs/storage for more configuration support
  # Further configuration support is here:
  # https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/blob/99520ae0125df7b24163e14cf4eba2c96fcf14bd/docs/addons/secrets-store-csi-driver-provider-aws.md

  # --------------------------------------------------------------------------------------------------------------------
  # Load Balancing Support for EKS
  # --------------------------------------------------------------------------------------------------------------------
  enable_aws_load_balancer_controller = true

  # --------------------------------------------------------------------------------------------------------------------
  # cert-manager
  # tf state show data.aws_route53_zone.selected (for details)
  # --------------------------------------------------------------------------------------------------------------------
  #enable_cert_manager                   = true
  #cert_manager_route53_hosted_zone_arns = [data.aws_route53_zone.selected.arn]

  ######################################################################################################################
  # Vendor Addons
  # Any add-ons from "independent software vendors" on the [Amazon EKS add-ons] page can be added like ExternalDNS.
  # https://docs.aws.amazon.com/eks/latest/userguide/eks-add-ons.html#workloads-add-ons-available-vendors
  # However, any program supported by a Helm Chart should be deployable via helm_releases block, like:
  # https://github.com/aws-ia/terraform-aws-eks-blueprints-addons/issues/245#issuecomment-1729329835
  ######################################################################################################################

  # --------------------------------------------------------------------------------------------------------------------
  # ExternalDNS
  # --------------------------------------------------------------------------------------------------------------------
  enable_external_dns            = true
  external_dns_route53_zone_arns = [data.aws_route53_zone.selected.arn]
  external_dns = {
    chart            = "external-dns"
    repository       = "https://kubernetes-sigs.github.io/external-dns/"
    role_name        = var.xdns-sa-name
    create_namespace = false
    namespace        = "kube-system"
    reuse_values     = true
    values = [
      "provider: aws",
      "txtOwnerId: ${data.aws_route53_zone.selected.zone_id}",
      "domainFilters: [${data.aws_route53_zone.selected.name}]",
      "policy: sync",
      "sources: [service, ingress]"
    ]
    #values     = [file("addons/eks/xdns/values.yaml")]
  }

  # Cluster COMMs
  cluster_name      = module.eks.cluster_name
  cluster_endpoint  = module.eks.cluster_endpoint
  cluster_version   = module.eks.cluster_version
  oidc_provider_arn = module.eks.oidc_provider_arn
}

Given the above config, running a subsequent plan outputs:

Note: Objects have changed outside of Terraform

Terraform detected the following changes made outside of Terraform since the
last "terraform apply" which may have affected this plan:

  # module.eks.module.eks_managed_node_group["default_node_group"].aws_eks_node_group.this[0] has changed
  ~ resource "aws_eks_node_group" "this" {
        id                     = "gitops-demo-stage:default_node_group-20231227164951821900000023"
      + labels                 = {}
        tags                   = {
            "Blueprint"  = "gitops-demo-stage"
            "GithubOrg"  = "aws-ia"
            "GithubRepo" = "github.com/aws-ia/terraform-aws-eks-blueprints"
            "Name"       = "default_node_group"
        }
        # (15 unchanged attributes hidden)

        # (4 unchanged blocks hidden)
    }

Unless you have made equivalent changes to your configuration, or ignored the
relevant attributes using ignore_changes, the following plan may include
actions to undo or respond to these changes.

─────────────────────────────────────────────────────────────────────────────

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create
-/+ destroy and then create replacement

Terraform will perform the following actions:

  # module.eks.kubernetes_config_map_v1_data.aws_auth[0] will be created
  + resource "kubernetes_config_map_v1_data" "aws_auth" {
      + data          = {
          + "mapAccounts" = jsonencode([])
          + "mapRoles"    = <<-EOT
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws-us-gov:iam::010101010101:role/default_node_group-eks-node-group-2023122716404042710000000f"
                  "username": "system:node:{{EC2PrivateDNSName}}"
                - "groups":
                  - "system:bootstrappers"
                  - "system:nodes"
                  "rolearn": "arn:aws-us-gov:iam::010101010101:role/karpenter-gitops-demo-stage-20231227164921799600000017"
                  "username": "system:node:{{EC2PrivateDNSName}}"
            EOT
          + "mapUsers"    = <<-EOT
                - "groups":
                  - "system:masters"
                  "userarn": "arn:aws-us-gov:iam::010101010101:user/user1"
                  "username": "user1"
                - "groups":
                  - "system:masters"
                  "userarn": "arn:aws-us-gov:iam::010101010101:user/user2"
                  "username": "user2"
            EOT
        }
      + field_manager = "Terraform"
      + force         = true
      + id            = (known after apply)

      + metadata {
          + name      = "aws-auth"
          + namespace = "kube-system"
        }
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["aws-ebs-csi-driver"] is tainted, so must be replaced
-/+ resource "aws_eks_addon" "this" {
      ~ arn                         = "arn:aws-us-gov:eks:us-gov-east-1:010101010101:addon/gitops-demo-stage/aws-ebs-csi-driver/010101010101" -> (known after apply)
      + configuration_values        = (known after apply)
      ~ created_at                  = "2023-12-27T16:49:53Z" -> (known after apply)
      ~ id                          = "gitops-demo-stage:aws-ebs-csi-driver" -> (known after apply)
      ~ modified_at                 = "2023-12-27T17:04:56Z" -> (known after apply)
      - tags                        = {} -> null
      ~ tags_all                    = {} -> (known after apply)
        # (7 unchanged attributes hidden)

      ~ timeouts {
          ~ create = "5m" -> "6m"
            # (2 unchanged attributes hidden)
        }
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["coredns"] is tainted, so must be replaced
-/+ resource "aws_eks_addon" "this" {
      ~ arn                         = "arn:aws-us-gov:eks:us-gov-east-1:010101010101:addon/gitops-demo-stage/coredns/010101010101010101010101" -> (known after apply)
      + configuration_values        = (known after apply)
      ~ created_at                  = "2023-12-27T16:49:53Z" -> (known after apply)
      ~ id                          = "gitops-demo-stage:coredns" -> (known after apply)
      ~ modified_at                 = "2023-12-27T17:04:51Z" -> (known after apply)
      - tags                        = {} -> null
      ~ tags_all                    = {} -> (known after apply)
        # (6 unchanged attributes hidden)

      ~ timeouts {
          ~ create = "5m" -> "6m"
            # (2 unchanged attributes hidden)
        }
    }

  # module.eks_blueprints_addons.aws_eks_addon.this["snapshot-controller"] is tainted, so must be replaced
-/+ resource "aws_eks_addon" "this" {
      ~ arn                         = "arn:aws-us-gov:eks:us-gov-east-1:010101010101:addon/gitops-demo-stage/snapshot-controller/2ec65617-5df1-5ccd-b31e-c5fb04a4ac0c" -> (known after apply)
      + configuration_values        = (known after apply)
      ~ created_at                  = "2023-12-27T16:49:53Z" -> (known after apply)
      ~ id                          = "gitops-demo-stage:snapshot-controller" -> (known after apply)
      ~ modified_at                 = "2023-12-27T17:04:54Z" -> (known after apply)
      - tags                        = {} -> null
      ~ tags_all                    = {} -> (known after apply)
        # (6 unchanged attributes hidden)

      ~ timeouts {
          ~ create = "5m" -> "6m"
            # (2 unchanged attributes hidden)
        }
    }

  # module.eks_blueprints_addons.module.external_dns.helm_release.this[0] is tainted, so must be replaced
-/+ resource "helm_release" "this" {
      ~ id                         = "external-dns" -> (known after apply)
      + manifest                   = (known after apply)
      ~ metadata                   = [
          - {
              - app_version = "0.13.5"
              - chart       = "external-dns"
              - name        = "external-dns"
              - namespace   = "kube-system"
              - revision    = 1
              - values      = jsonencode(
                    {
                      - domainFilters  = [
                          - "domain.tld",
                        ]
                      - policy         = "sync"
                      - provider       = "aws"
                      - serviceAccount = {
                          - annotations = {
                              - "eks.amazonaws.com/role-arn" = "arn:aws-us-gov:iam::010101010101:role/external-dns-sa-stage-2023122716495329520000002d"
                            }
                          - name        = "external-dns-sa"
                        }
                      - sources        = [
                          - "service",
                          - "ingress",
                        ]
                      - txtOwnerId     = "010101010101010101010101"
                    }
                )
              - version     = "1.13.0"
            },
        ] -> (known after apply)
        name                       = "external-dns"
      ~ status                     = "failed" -> "deployed"
        # (27 unchanged attributes hidden)

        # (2 unchanged blocks hidden)
    }

  # module.eks_blueprints_addons.module.karpenter.helm_release.this[0] is tainted, so must be replaced
-/+ resource "helm_release" "this" {
      ~ id                         = "karpenter" -> (known after apply)
      + manifest                   = (known after apply)
      ~ metadata                   = [
          - {
              - app_version = "0.32.1"
              - chart       = "karpenter"
              - name        = "karpenter"
              - namespace   = "karpenter"
              - revision    = 1
              - values      = jsonencode(
                    {
                      - serviceAccount = {
                          - annotations = {
                              - "eks.amazonaws.com/role-arn" = "arn:aws-us-gov:iam::010101010101:role/karpenter-2023122716495328420000002c"
                            }
                          - name        = "karpenter"
                        }
                      - settings       = {
                          - aws               = {
                              - clusterEndpoint       = "https://010101010101010101010101010101010101.sk1.us-gov-east-1.eks.amazonaws.com"
                              - clusterName           = "gitops-demo-stage"
                              - interruptionQueueName = "karpenter-gitops-demo-stage"
                            }
                          - clusterEndpoint   = "https://010101010101010101010101010101010101.sk1.us-gov-east-1.eks.amazonaws.com"
                          - clusterName       = "gitops-demo-stage"
                          - interruptionQueue = "karpenter-gitops-demo-stage"
                        }
                    }
                )
              - version     = "v0.32.1"
            },
        ] -> (known after apply)
        name                       = "karpenter"
      ~ status                     = "failed" -> "deployed"
        # (27 unchanged attributes hidden)

        # (8 unchanged blocks hidden)
    }

Plan: 6 to add, 0 to change, 5 to destroy.

If I were then to apply these changes, they would build without error in ~60s.

Additional context

If it is the Terraform parallelism parameter, perhaps we could look beyond it to a real solution?

Setting this to 1 causes very long builds. If it's something else, I'd like to hear what the maintainers think.

TIA

vstthomas commented 10 months ago

Just dredged this out of the terraform logs on another build

   http.response.header.access_control_allow_origin="*" http.response.header.date="Thu, 28 Dec 2023 16:25:01 GMT" http.response.header.x_amzn_requestid=6d86cae6-95a3-4783-a677-40965613616f tf_aws.sdk=aws-sdk-go-v2 tf_mux_provider="*schema.GRPCProviderServer" timestamp=2023-12-28T08:25:01.069-0800
2023-12-28T08:25:01.079-0800 [DEBUG] provider.terraform-provider-aws_v5.25.0_x5: HTTP Response Received:
  http.response.body=
  | {
  |   "addon" : {
  |     "addonName" : "snapshot-controller",
  |     "clusterName" : "gitops-demo-stage",
  |     "status" : "DEGRADED",
  |     "addonVersion" : "v6.3.2-eksbuild.1",
  |     "health" : {
  |       "issues" : [ {
  |         "code" : "InsufficientNumberOfReplicas",
  |         "message" : "The add-on is unhealthy because it doesn't have the desired number of replicas.",
  |         "resourceIds" : null
  |       } ]
  |     },
  |     "addonArn" : "arn:aws-us-gov:eks:us-gov-east-1:367652197469:addon/gitops-demo-stage/snapshot-controller/8cc6589c-8e75-42f3-4a39-1f6481dd9616",
  |     "createdAt" : 1.703780359717E9,
  |     "modifiedAt" : 1.703780371096E9,
  |     "serviceAccountRoleArn" : null,
  |     "tags" : { },
  |     "publisher" : null,
  |     "owner" : null,
  |     "marketplaceInformation" : null,
  |     "configurationValues" : null
  |   }
  | }
   tf_rpc=ApplyResourceChange @caller=github.com/hashicorp/aws-sdk-go-base/v2@v2.0.0-beta.39/logging/tf_logger.go:45 http.response.header.access_control_allow_origin="*" tf_req_id=84fad3e7-cdc2-6efa-8332-c5179fac2de9 http.status_code=200 rpc.service=EKS rpc.system=aws-api http.response.header.access_control_allow_methods="GET,HEAD,PUT,POST,DELETE,OPTIONS" http.response.header.date="Thu, 28 Dec 2023 16:25:01 GMT" http.response.header.x_amzn_requestid=364ff971-d3e3-4fc5-99e5-702ea0ff909c http.response.header.content_type=application/json http.response_content_length=796 http.response.header.access_control_allow_headers="*,Authorization,Date,X-Amz-Date,X-Amz-Security-Token,X-Amz-Target,content-type,x-amz-content-sha256,x-amz-user-agent,x-amzn-platform-id,x-amzn-trace-id" http.response.header.access_control_expose_headers="x-amzn-errortype,x-amzn-errormessage,x-amzn-trace-id,x-amzn-requestid,x-amz-apigw-id,date" tf_resource_type=aws_eks_addon http.duration=206 tf_provider_addr=registry.terraform.io/hashicorp/aws http.response.header.x_amz_apigw_id=QqYmkH7ZulQFmIw= http.response.header.x_amzn_trace_id=Root=1-658da15c-6b1a93965abbb38710b4209e tf_aws.sdk=aws-sdk-go-v2 tf_mux_provider="*schema.GRPCProviderServer" @module=aws aws.region=us-gov-east-1 rpc.method=DescribeAddon timestamp=2023-12-28T08:25:01.079-0800

bryantbiggs commented 10 months ago

Let's start with a proper reproduction first

vstthomas commented 10 months ago

How would you like to see that?

bryantbiggs commented 10 months ago

https://github.com/bryantbiggs/how-to-create-reproduction

vstthomas commented 10 months ago

These are my steps:

tf init - works as expected
tf plan: fail:

Planning failed. Terraform encountered an error while generating this plan.

│ Error: configuring Terraform AWS Provider: validating provider credentials: retrieving caller identity from STS: operation error STS: GetCallerIdentity, https response error StatusCode: 403, RequestID: f6b1d826-c73f-4f9e-a56e-dd84d8166eee, api error InvalidClientTokenId: The security token included in the request is invalid.
│ 
│   with provider["registry.terraform.io/hashicorp/aws"],
│   on main.tf line 12, in provider "aws":
│   12: provider "aws" {

However, if I were to adjust the code

FROM

provider "aws" {
  region = local.region
}

TO:

provider "aws" {
  region = local.region
  alias = "virginia"
}

The plan works afterwards; not sure why. But, from the Provider Configuration page, it says:

You can use expressions in the values of these configuration arguments, but can only reference values that are known before the configuration is applied. This means you can safely reference input variables, but not attributes exported by resources (with an exception for resource arguments that are specified directly in the configuration).

So, it looks like region = local.region might need to be region = hardCodedRegion. Seems like it still shouldn't work by adding the alias either though 🤷

The apply flaked on node config:

% tf apply -auto-approve
...
module.eks_managed_node_group.aws_eks_node_group.this[0]: Still creating... [5m0s elapsed]
module.eks_managed_node_group.aws_eks_node_group.this[0]: Creation complete after 5m5s [id=reproduction:separate-2024010823044795020000000f]
╷
│ Error: reading IAM Role Managed Policy Attachment (reproduction-20240108225525688100000001:arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly): couldn't find resource
│ 
│   with aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly"],
│   on main.tf line 60, in resource "aws_iam_role_policy_attachment" "this":
│   60: resource "aws_iam_role_policy_attachment" "this" {
│ 
╵
╷
│ Error: reading IAM Role Managed Policy Attachment (reproduction-20240108225525688100000001:arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy): couldn't find resource
│ 
│   with aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"],
│   on main.tf line 60, in resource "aws_iam_role_policy_attachment" "this":
│   60: resource "aws_iam_role_policy_attachment" "this" {
│ 
╵
╷
│ Error: reading IAM Role Managed Policy Attachment (reproduction-20240108225525688100000001:arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy): couldn't find resource
│ 
│   with aws_iam_role_policy_attachment.this["arn:aws:iam::aws:policy/AmazonEKSWorkerNodePolicy"],
│   on main.tf line 60, in resource "aws_iam_role_policy_attachment" "this":
│   60: resource "aws_iam_role_policy_attachment" "this" {

In my case it's likely because of the government partition. Fixed that with:

data "aws_partition" "current" {}

locals {
  name   = "reproduction"
  region = "us-east-1"

  part     = data.aws_partition.current.partition
...
}

# Then updated the attachments
resource "aws_iam_role_policy_attachment" "this" {
  for_each = { for k, v in toset([
    "arn:${local.part}:iam::aws:policy/AmazonEKSWorkerNodePolicy",
    "arn:${local.part}:iam::aws:policy/AmazonEC2ContainerRegistryReadOnly",
    "arn:${local.part}:iam::aws:policy/AmazonEKS_CNI_Policy"
  ]) : k => v }

  policy_arn = each.value
  role       = aws_iam_role.this.name
}

Not really sure what we're looking for so here's everything 😀

Anyway, it's up and running. What info did you need out of this?

github-actions[bot] commented 8 months ago

This issue has been automatically marked as stale because it has been open 30 days with no activity. Remove stale label or comment or this issue will be closed in 10 days

github-actions[bot] commented 8 months ago

Issue closed due to inactivity.

aws-ia / terraform-aws-eks-blueprints-addons