aws-observability / terraform-aws-observability-accelerator

Open source project to help accelerate and ease observability setup on AWS environments
https://aws-observability.github.io/terraform-aws-observability-accelerator/
Apache License 2.0
288 stars 84 forks source link

[Bug]: Some kubectl_manifest resource to be re-created sporadically in EKS 1.28 #244

Closed rlanore closed 11 months ago

rlanore commented 11 months ago

Welcome to the AWS Observability Accelerator

AWS Observability Accelerator Release version

2.10.0

What is your environment, configuration and the example used?

Version

Plugins version

Module version

Create EKS into another terraform directory

Add eks-monitoring module to our eks with this tf file

locals {
  name        = "AMG-foo-bar"
  description = "Amazon Managed Grafana workspace for ${local.name}"
}
module "aws_observability_accelerator" {
  source = "github.com/aws-observability/terraform-aws-observability-accelerator?ref=v2.10.0"
  aws_region = "eu-west-1"
  # creates a new Amazon Managed Prometheus workspace, defaults to true
  enable_managed_prometheus = true
  # reusing existing Amazon Managed Prometheus if specified
  managed_prometheus_workspace_id = false
  # sets up the Amazon Managed Prometheus alert manager at the workspace level
  enable_alertmanager = true
  # reusing existing Amazon Managed Grafana workspace
  managed_grafana_workspace_id = module.managed_grafana.workspace_id

}

module "managed_grafana" {
  source  = "terraform-aws-modules/managed-service-grafana/aws"
  version = "2.0.0"

  name                      = local.name
  grafana_version           = "9.4"
  associate_license         = false
  description               = local.description
  account_access_type       = "CURRENT_ACCOUNT"
  authentication_providers  = ["AWS_SSO"]
  permission_type           = "SERVICE_MANAGED"
  data_sources              = ["CLOUDWATCH", "PROMETHEUS"]
  notification_destinations = []
  stack_set_name            = local.name

  configuration = jsonencode({
    unifiedAlerting = {
      enabled = true
    },
    plugins = {
      pluginAdminEnabled = false
    }
  })

  # Workspace API keys
  workspace_api_keys = {
    viewer = {
          key_name        = "viewer"
          key_role        = "VIEWER"
          seconds_to_live = 3600
    }
    editor = {
          key_name        = "editor"
          key_role        = "EDITOR"
          seconds_to_live = 3600
    }
    admin = {
          key_name        = "admin"
          key_role        = "ADMIN"
          seconds_to_live = 3600
    }
  }

  # Workspace IAM role
  create_iam_role                = true
  iam_role_name                  = local.name
  use_iam_role_name_prefix       = true
  iam_role_description           = local.description
  iam_role_path                  = "/grafana/"
  iam_role_force_detach_policies = true
  iam_role_max_session_duration  = 7200
  #iam_role_tags                  = local.tags
  # Role associations
    # Ref: https://github.com/aws/aws-sdk/issues/25
    # Ref: https://github.com/hashicorp/terraform-provider-aws/issues/18812
    # WARNING: https://github.com/hashicorp/terraform-provider-aws/issues/24166
  role_associations = {
    "ADMIN" = {
          "group_ids" = ["uuid-0" ]
    }
    "EDITOR" = {
          "group_ids" = ["uuid-1"]
    } 
    "VIEWER" = {
          "group_ids" = ["uuid-2"]
    }
  }
}

module "eks_monitoring" {
  # Eks module depend on prometheus created to retrieve endpoint
  depends_on = [ module.aws_observability_accelerator  ]
  source = "github.com/aws-observability/terraform-aws-observability-accelerator//modules/eks-monitoring?ref=v2.10.0"

  eks_cluster_id = "eks-foo-bar"
  # deploys AWS Distro for OpenTelemetry operator into the cluster
  enable_amazon_eks_adot = true
  enable_cert_manager = true
  enable_java         = false

  target_secret_name      = "grafana-admin-credentials"
  target_secret_namespace = "grafana-operator"
  grafana_api_key         = module.managed_grafana.workspace_api_keys.admin.key
  grafana_url             = module.aws_observability_accelerator.managed_grafana_workspace_endpoint

  # This configuration section results in actions performed on AMG and AMP; and it needs to be done just once
  # And hence, this in performed in conjunction with the setup of the eks_cluster_1 EKS cluster
  enable_dashboards       = true
  enable_external_secrets = true
  enable_fluxcd           = true
  enable_alerting_rules   = true
  enable_recording_rules  = true

  managed_prometheus_workspace_id = module.aws_observability_accelerator.managed_prometheus_workspace_id

  managed_prometheus_workspace_endpoint = module.aws_observability_accelerator.managed_prometheus_workspace_endpoint
  managed_prometheus_workspace_region   = module.aws_observability_accelerator.managed_prometheus_workspace_region

  # optional, defaults to 60s interval and 15s timeout
  prometheus_config = {
      global_scrape_interval = "60s"
      global_scrape_timeout  = "15s"

  }

  enable_logs = true
  logs_config = {
      cw_log_retention_days = 30
  }
}

What did you do and What did you see instead?

Some times on terraform apply one or many resource kubectl_manifest will be re-created. I check inside my cluster and this resource is correctly in place.

Additional Information

Search from kubectl provider on github and they are a lot of user to have issue with it.

* Version of this provider is quite old 24 March 2022
* https://github.com/gavinbunney/terraform-provider-kubectl/issues/270
* https://github.com/gavinbunney/terraform-provider-kubectl/issues/274

I will try with kubectl provider from https://github.com/alekc/terraform-provider-kubectl

But in generale context is it a good idea to depend on not official provider like this ?

Is it a good idea to change kubectl_manifest to local helm_release like otel-config in this repo ?
rlanore commented 11 months ago

Try with alekc but at first apply i have this

module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Creating...
module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Creation complete after 3s [id=/apis/source.toolkit.fluxcd.io/v1beta2/namespaces/flux-system/gitrepositorys/aws-observability-accelerator]

and next apply

module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Refreshing state... [id=/apis/source.toolkit.fluxcd.io/v1beta2/namespaces/flux-system/gitrepositorys/aws-observability-accelerator]

Terraform used the selected providers to generate the following execution
plan. Resource actions are indicated with the following symbols:
  + create

Terraform will perform the following actions:

  # module.eks_monitoring.kubectl_manifest.flux_gitrepository[0] will be created
  + resource "kubectl_manifest" "flux_gitrepository" {
... ....

And this failed with

module.eks_monitoring.kubectl_manifest.flux_gitrepository[0]: Creating...
╷
│ Error: flux-system/aws-observability-accelerator failed to run apply: error when creating "/tmp/2083107233kubectl_manifest.yaml": the server could not find the requested resource (post gitrepositories.source.toolkit.fluxcd.io)
rlanore commented 11 months ago

Finaly I get it to work multiple times. I think previous error hit miss configuration some where After destroy all stack and re-create it. it's work with alekc kubectl provider