Version

eks: 1.28
terraform: v1.5.4

Plugins version

Installed grafana/grafana v2.7.0 (signed by a HashiCorp partner, key ID 570AA42029AE241A)
Installed hashicorp/aws v5.29.0 (signed by HashiCorp)
Installed hashicorp/kubernetes v2.24.0 (signed by HashiCorp)
Installed gavinbunney/kubectl v1.14.0 (self-signed, key ID AD64217B5ADD572F)
Installed hashicorp/awscc v0.66.0 (signed by HashiCorp)
Installed hashicorp/helm v2.12.1 (signed by HashiCorp)

Module version

Downloading git::https://github.com/aws-observability/terraform-aws-observability-accelerator.git?ref=v2.10.0
Downloading git::https://github.com/aws-observability/terraform-aws-observability-accelerator.git?ref=v2.10.0
Downloading git::https://github.com/aws-ia/terraform-aws-eks-blueprints.git?ref=v4.32.1
Downloading git::https://github.com/aws-ia/terraform-aws-eks-blueprints.git?ref=v4.32.0
Downloading registry.terraform.io/terraform-aws-modules/managed-service-grafana/aws 2.0.0

Create EKS into another terraform directory

Add eks-monitoring module to our eks with this tf file

locals {
  name        = "AMG-foo-bar"
  description = "Amazon Managed Grafana workspace for ${local.name}"
}
module "aws_observability_accelerator" {
  source = "github.com/aws-observability/terraform-aws-observability-accelerator?ref=v2.10.0"
  aws_region = "eu-west-1"
  # creates a new Amazon Managed Prometheus workspace, defaults to true
  enable_managed_prometheus = true
  # reusing existing Amazon Managed Prometheus if specified
  managed_prometheus_workspace_id = false
  # sets up the Amazon Managed Prometheus alert manager at the workspace level
  enable_alertmanager = true
  # reusing existing Amazon Managed Grafana workspace
  managed_grafana_workspace_id = module.managed_grafana.workspace_id

}

module "managed_grafana" {
  source  = "terraform-aws-modules/managed-service-grafana/aws"
  version = "2.0.0"

  name                      = local.name
  grafana_version           = "9.4"
  associate_license         = false
  description               = local.description
  account_access_type       = "CURRENT_ACCOUNT"
  authentication_providers  = ["AWS_SSO"]
  permission_type           = "SERVICE_MANAGED"
  data_sources              = ["CLOUDWATCH", "PROMETHEUS"]
  notification_destinations = []
  stack_set_name            = local.name

  configuration = jsonencode({
    unifiedAlerting = {
      enabled = true
    },
    plugins = {
      pluginAdminEnabled = false
    }
  })

  # Workspace API keys
  workspace_api_keys = {
    viewer = {
          key_name        = "viewer"
          key_role        = "VIEWER"
          seconds_to_live = 3600
    }
    editor = {
          key_name        = "editor"
          key_role        = "EDITOR"
          seconds_to_live = 3600
    }
    admin = {
          key_name        = "admin"
          key_role        = "ADMIN"
          seconds_to_live = 3600
    }
  }

  # Workspace IAM role
  create_iam_role                = true
  iam_role_name                  = local.name
  use_iam_role_name_prefix       = true
  iam_role_description           = local.description
  iam_role_path                  = "/grafana/"
  iam_role_force_detach_policies = true
  iam_role_max_session_duration  = 7200
  #iam_role_tags                  = local.tags
  # Role associations
    # Ref: https://github.com/aws/aws-sdk/issues/25
    # Ref: https://github.com/hashicorp/terraform-provider-aws/issues/18812
    # WARNING: https://github.com/hashicorp/terraform-provider-aws/issues/24166
  role_associations = {
    "ADMIN" = {
          "group_ids" = ["uuid-0" ]
    }
    "EDITOR" = {
          "group_ids" = ["uuid-1"]
    } 
    "VIEWER" = {
          "group_ids" = ["uuid-2"]
    }
  }
}

module "eks_monitoring" {
  # Eks module depend on prometheus created to retrieve endpoint
  depends_on = [ module.aws_observability_accelerator  ]
  source = "github.com/aws-observability/terraform-aws-observability-accelerator//modules/eks-monitoring?ref=v2.10.0"

  eks_cluster_id = "eks-foo-bar"
  # deploys AWS Distro for OpenTelemetry operator into the cluster
  enable_amazon_eks_adot = true
  enable_cert_manager = true
  enable_java         = false

  target_secret_name      = "grafana-admin-credentials"
  target_secret_namespace = "grafana-operator"
  grafana_api_key         = module.managed_grafana.workspace_api_keys.admin.key
  grafana_url             = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

  # This configuration section results in actions performed on AMG and AMP; and it needs to be done just once
  # And hence, this in performed in conjunction with the setup of the eks_cluster_1 EKS cluster
  enable_dashboards       = true
  enable_external_secrets = true
  enable_fluxcd           = true
  enable_alerting_rules   = true
  enable_recording_rules  = true

  managed_prometheus_workspace_id = module.aws_observability_accelerator.managed_prometheus_workspace_id

  managed_prometheus_workspace_endpoint = module.aws_observability_accelerator.managed_prometheus_workspace_endpoint
  managed_prometheus_workspace_region   = module.aws_observability_accelerator.managed_prometheus_workspace_region

  # optional, defaults to 60s interval and 15s timeout
  prometheus_config = {
      global_scrape_interval = "60s"
      global_scrape_timeout  = "15s"

  }

  enable_logs = true
  logs_config = {
      cw_log_retention_days = 30
  }
}

What did you do and What did you see instead?

Some times on terraform apply one or many resource kubectl_manifest will be re-created. I check inside my cluster and this resource is correctly in place.

Additional Information

Search from kubectl provider on github and they are a lot of user to have issue with it.

* Version of this provider is quite old 24 March 2022
* https://github.com/gavinbunney/terraform-provider-kubectl/issues/270
* https://github.com/gavinbunney/terraform-provider-kubectl/issues/274

I will try with kubectl provider from https://github.com/alekc/terraform-provider-kubectl

But in generale context is it a good idea to depend on not official provider like this ?

Is it a good idea to change kubectl_manifest to local helm_release like otel-config in this repo ?

rlanore commented 11 months ago

Try with alekc but at first apply i have this

module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Creating...
module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Creation complete after 3s [id=/apis/source.toolkit.fluxcd.io/v1beta2/namespaces/flux-system/gitrepositorys/aws-observability-accelerator]

and next apply

module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Refreshing state... [id=/apis/source.toolkit.fluxcd.io/v1beta2/namespaces/flux-system/gitrepositorys/aws-observability-accelerator]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.eks_monitoring.kubectl_manifest.flux_gitrepository[0] will be created
  + resource "kubectl_manifest" "flux_gitrepository" {
... ....

And this failed with

module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Creating...
╷
│ Error: flux-system/aws-observability-accelerator failed to run apply: error when creating "/tmp/2083107233kubectl_manifest.yaml": the server could not find the requested resource (post gitrepositories.source.toolkit.fluxcd.io)

rlanore commented 11 months ago

Finaly I get it to work multiple times. I think previous error hit miss configuration some where After destroy all stack and re-create it. it's work with alekc kubectl provider

aws-observability / terraform-aws-observability-accelerator

[Bug]: Some kubectl_manifest resource to be re-created sporadically in EKS 1.28 #244

Welcome to the AWS Observability Accelerator

AWS Observability Accelerator Release version

What is your environment, configuration and the example used?