aws / eks-charts

Amazon EKS Helm chart repository
Apache License 2.0
1.19k stars 957 forks source link

[aws-for-fluent-bit] Log Group retention time setting not working. #436

Open luisamador opened 3 years ago

luisamador commented 3 years ago

Describe the bug The option "cloudWatch.logRetentionDays" doesn't set the log retention days setting of the resulting CloudWatch log group.

Steps to reproduce

cloudWatch:
  region: us-east-1
  logGroupName: /aws/eks/blah/fluentbit-cloudwatch
  logRetentionDays: 3
  logKey: log
firehose:
  enabled: false
kinesis:
  enabled: false
elasticsearch:
  enabled: false

Expected outcome The resulting log group should have a retention policy of 3 days. However it is set with a "Never expire" retention policy.

Environment

seansabour commented 3 years ago

i'm also running into this issue as-well

PettitWesley commented 3 years ago

The plugin used to only set the log retention on new log groups. This means if you have run the same config before then the log group might already exist, and the plugin will not update the retention.

We updated this recently and released it in AWS for Fluent Bit 2.10.0 for the cloudwatch plugin: https://github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/issues/121

JonathanLachapelle commented 3 years ago

I also have the same issue for both existing and new log groups.

barantomasz83 commented 2 years ago

In my case it was missing action in AWS iam policy used by FB pods. "logs:putRetentionPolicy" solved problem

illagrenan commented 2 years ago

I have the same problem in version 2.21.4. Retention for new and existing log groups is always set to Never.

illagrenan commented 2 years ago

The problem was indeed the missing logs:putRetentionPolicy permission. I use EKSCTL to manage my EKS cluster and all my nodes have this IAM (ref.: https://eksctl.io/usage/iam-policies/#supported-iam-add-on-policies):

nodeGroups:
  - ...
    iam:
      withAddonPolicies:
        cloudWatch: true

In practice, nodes have this policy: arn:aws:iam::aws:policy/CloudWatchAgentServerPolicy. It contains the following:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "cloudwatch:PutMetricData",
                "ec2:DescribeVolumes",
                "ec2:DescribeTags",
                "logs:PutLogEvents",
                "logs:DescribeLogStreams",
                "logs:DescribeLogGroups",
                "logs:CreateLogStream",
                "logs:CreateLogGroup"
            ],
            "Resource": "*"
        },
        {
            "Effect": "Allow",
            "Action": [
                "ssm:GetParameter"
            ],
            "Resource": "arn:aws:ssm:*:*:parameter/AmazonCloudWatch-*"
        }
    ]
}

And that's the problem, these permissions are insufficient.

trallnag commented 2 years ago

@illagrenan, exactly. The AWS documentation needs to be updated.

mattduguid commented 2 years ago

Have been seeing similar issue where "log_retention_days" was not being set on our "additionalOutputs" and stayed at "Never expire",

Versions,

Extract from aws-for-fluentbit-values.yaml

***etc***
additionalOutputs: |
[OUTPUT]
    Name                           cloudWatch
    Enabled                       true
    Match                          ebs-csi.*
    Region                         ap-southeast-2
    Log_Group_Name      /aws/eks/container-workload/xxxxx-ebs-csi
    Log_Stream_Prefix     fluentbit-
    Log_Retention_Days  14
    Auto_Create_Group    true

[OUTPUT]
    Name                           cloudWatch
    Enabled                       true
    Match                          xxxxx-sm.*
    Region                         ap-southeast-2
    Log_Group_Name      /aws/eks/container-workload/xxxxx-xxxxx-sm
    Log_Stream_Prefix    fluentbit-
    Log_Retention_Days  14
    Auto_Create_Group    true
***etc***

After reading the previous posts I observed that if the missing permission "logs:PutRetentionPolicy" is manually added (as not there by default) and I rerun the pipeline the permission is removed again, this should be added to the permanent list.

Error from the logs when trying to set log_retention_days,

time="2022-06-08T05:27:14Z" level=error msg="AccessDeniedException: User: arn:aws:sts::************:assumed-role/container-workload-aws-for-fluent-bit-sa-irsa/1654666034225831554 is not authorized to perform: logs:PutRetentionPolicy on resource: arn:aws:logs:ap-southeast-2:************:log-group:/aws/eks/container-workload/*****-cert-manager:log-stream: because no identity-based policy allows the logs:PutRetentionPolicy action\n\tstatus code: 400, request id: c2209996-****-4ae8-*****-03d4272f16f6" func="github.com/aws/amazon-cloudwatch-logs-for-fluent-bit/cloudwatch.(*OutputPlugin).AddEvent()" file="cloudwatch.go:340"

Manually added the missing permission back, deleted the loggroups so they would be forced to recreate, restarted the daemonset for fluentbit which recreates the loggroups and the log retention is set correctly,

image