aws-ia / terraform-aws-eks-blueprints-addons

Terraform module which provisions addons on Amazon EKS clusters
https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/
Apache License 2.0
262 stars 124 forks source link

Using enable_aws_for_fluentbit = true creates a log group with a suffix but is used without it from the container #341

Closed jpambrun closed 3 days ago

jpambrun commented 9 months ago

Description

Adding enable_aws_for_fluentbit = true to the configuration does create a log group, /aws/eks/jf-test-cluster/aws-fluentbit-logs-20240110180918467200000001 in my case, but no logs makes it there.

This is not in line with the documentation [1] which states: "Check the list of log groups in the Region. You should see the following: /aws/eks/complete/aws-fluentbit-logs".

Then, looking at the fluenbit container log I see

Log Group /aws/eks/jf-test-cluster/aws-fluentbit-logs not found and `auto_create_group` disabled.     
Failed to send events                                                                              

It looks like there is a disconnect between what is created (i.e. with the suffix) and what is referenced in the configmap (without the suffix)

[OUTPUT]
    Name                  cloudwatch_logs
    Match                 *
    region                us-east-2
    log_group_name        /aws/eks/jf-test-cluster/aws-fluentbit-logs
    log_stream_prefix     fluentbit-log_stream_template   $kubernetes['pod_name'].$kubernetes['container_name']

I don't have any other fluentbit related config, but even specifying a different prefix in aws_for_fluentbit_cw_log_group doesn't help.

[1] https://aws-ia.github.io/terraform-aws-eks-blueprints-addons/main/addons/aws-for-fluentbit/

joelhoisko commented 3 days ago

Based on these lines of code, if you don't specify the var.aws_for_fluentbit_cw_log_group.use_name_prefix variable to false yourself the code defaults to setting the aws_cloudwatch_log_group resources name property to null. The name_prefix property also defaults to "${local.aws_for_fluentbit_cw_log_group_name}-".

Then when setting up the values for the fluentbit helm chart here the code sets the cloudWatchLogs.logGroupName to local.aws_for_fluentbit_cw_log_group_name instead of using the created resource with aws_cloudwatch_log_group.aws_for_fluentbit.name

I'm now trying to create a new cluster with this and in the terraform plan I can see that the value of the property is set to "/aws/eks/prod-eu-north-1/aws-fluentbit-logs" instead of the real name:

# module.prod_eu_north_1_cluster.module.eks_kubernetes_addons.module.aws_for_fluentbit.helm_release.this[0] will be created
  + resource "helm_release" "this" {
      + atomic                     = false
      + chart                      = "aws-for-fluent-bit"
      + cleanup_on_fail            = false
      + create_namespace           = false
      + dependency_update          = false
      + description                = "A Helm chart to install the Fluent-bit Driver"
      + disable_crd_hooks          = false
      + disable_openapi_validation = false
      + disable_webhooks           = false
      + force_update               = false
      + id                         = (known after apply)
      + lint                       = false
      + manifest                   = (known after apply)
      + max_history                = 0
      + metadata                   = (known after apply)
      + name                       = "aws-for-fluent-bit"
      + namespace                  = "kube-system"
      + pass_credentials           = false
      + recreate_pods              = false
      + render_subchart_notes      = true
      + replace                    = false
      + repository                 = "https://aws.github.io/eks-charts"
      + reset_values               = false
      + reuse_values               = false
      + skip_crds                  = false
      + status                     = "deployed"
      + timeout                    = 300
      + values                     = []
      + verify                     = false
      + version                    = "0.1.32"
      + wait                       = false
      + wait_for_jobs              = false

      + set {
          + name  = "cloudWatch.region"
          + value = "eu-north-1"
            # (1 unchanged attribute hidden)
        }
      + set {
          + name  = "cloudWatchLogs.autoCreateGroup"
          + value = "false"
            # (1 unchanged attribute hidden)
        }
        # it's not using the prefix here
      + set {
          + name  = "cloudWatchLogs.logGroupName"
          + value = "/aws/eks/prod-eu-north-1/aws-fluentbit-logs"
            # (1 unchanged attribute hidden)
        }
      + set {
          + name  = "cloudWatchLogs.logGroupTemplate"
            # (2 unchanged attributes hidden)
        }
      + set {
          + name  = "cloudWatchLogs.region"
          + value = "eu-north-1"
            # (1 unchanged attribute hidden)
        }
      + set {
          + name  = "serviceAccount.annotations.eks\\.amazonaws\\.com/role-arn"
          + value = (known after apply)
            # (1 unchanged attribute hidden)
        }
      + set {
          + name  = "serviceAccount.name"
          + value = "aws-for-fluent-bit-sa"
            # (1 unchanged attribute hidden)
        }
    }

The created log group has a name_prefix:

# module.prod_eu_north_1_cluster.module.eks_kubernetes_addons.aws_cloudwatch_log_group.aws_for_fluentbit[0] will be created
  + resource "aws_cloudwatch_log_group" "aws_for_fluentbit" {
      + arn               = (known after apply)
      + id                = (known after apply)
      + log_group_class   = (known after apply)
      + name              = (known after apply)
      + name_prefix       = "/aws/eks/prod-eu-north-1/aws-fluentbit-logs-"
      + retention_in_days = 90
      + skip_destroy      = false
      + tags_all          = (known after apply)
    }

Made a quick and dirty PR for this, haven't looked into it more (or tested it either).