aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
437 stars 130 forks source link

Support for KMS in CloudWatch logs output #605

Open bonclay7 opened 1 year ago

bonclay7 commented 1 year ago
### Describe the question/issue I believe this is more related to a feature request than a bug. I would like to have a KMS key arn as one of [the parameters](https://docs.fluentbit.io/manual/pipeline/outputs/cloudwatch#configuration-parameters) for the CloudWatch Logs output. Certain customers have an enforced policy to use a KMS encryption key for their logs. One workaround is to pre-create the log group and use a fix log group name and all log streams created by FluentBit. However, if I have a `log_group_template` that allows dynamic log groups creation, a best scenario would be to provide a kms key into the config. Happy to have suggestions as well ### Configuration
cloudWatchLogs:
  enabled: true
  region: ${aws_region}
  # logGroupName is a fallback to failed parsing
  logGroupName: /aws/eks/observability-accelerator/workloads
  logGroupTemplate: /aws/eks/observability-accelerator/${cluster_name}/$kubernetes['namespace_name']
  logStreamTemplate: $kubernetes['container_name'].$kubernetes['pod_name']
  log_key: log
  log_retention_days: ${log_retention_days}

Fluent Bit Version Info

Using https://github.com/aws/eks-charts/tree/master/stable/aws-for-fluent-bit at 0.1.24

PettitWesley commented 1 year ago

Good point that when you use templating the log groups are created dynamically and so creating them in CFN/infra as code may be less convenient, so the workaround to set encryption in infra as code won't always work.

omidraha commented 10 months ago

I tried to using log_stream_template but I got this error:

[2023/09/01 21:25:48] [ warn] [record accessor] translation failed, root key=kubernetes

I want to separate stream logs based on pod name.

[OUTPUT]
    Name        cloudwatch_logs
    Match           application.*
    region          ${AWS_REGION}
    log_group_name      /${CLUSTER_NAME}/application
    log_stream_prefix   log-
    log_stream_template $kubernetes['pod_name']
    auto_create_group   true
    extra_user_agent    container-insights
PettitWesley commented 10 months ago

@omidraha Your template looks correct to me.

please also see:

I suspect the issue is that kubernetes metadata failed to be attached to your logs. Do you see k8s metadata in the logs in CW? Do you see any other error messages in Fluent Bit?

omidraha commented 10 months ago

Here is my full configuration file: https://github.com/omidraha/pulumi_example/blob/main/cw/cw.py https://github.com/omidraha/pulumi_example/blob/main/cw/setup.py

Here is a captured log from the CloudWatch panel in the Logs/Log group section for the/dev/application log group name.

{
    "time": "2023-09-01T23:54:27.022468725Z",
    "stream": "stderr",
    "_p": "F",
    "log": "[2023-09-01 23:54:27 +0000] [9] [DEBUG] GET /"
}

Info

kubectl get pods -n amazon-cloudwatch
NAME               READY   STATUS    RESTARTS   AGE
fluent-bit-7677w   1/1     Running   0          21m
fluent-bit-k5x56   1/1     Running   0          21m
fluent-bit-k8lks   1/1     Running   0          21m
fluent-bit-kv62w   1/1     Running   0          21m
fluent-bit-llrbh   1/1     Running   0          21m
fluent-bit-xw5lm   1/1     Running   0          21m
kubectl describe pod fluent-bit-7677w -n amazon-cloudwatch
Name:             fluent-bit-7677w
Namespace:        amazon-cloudwatch
Priority:         0
Service Account:  fluent-bit
Node:             ip-***
Start Time:       Fri, 01 Sep 2023 16:26:14 -0700
Labels:           controller-revision-hash=6cb6db756b
                  k8s-app=fluent-bit
                  kubernetes.io/cluster-service=true
                  pod-template-generation=1
                  version=v1
Annotations:      <none>
Status:           Running
IP:               ***
IPs:
  IP:           ***
Controlled By:  DaemonSet/fluent-bit
Containers:
  fluent-bit:
    Container ID:   containerd://***
    Image:          public.ecr.aws/aws-observability/aws-for-fluent-bit:latest
    Image ID:       public.ecr.aws/aws-observability/aws-for-fluent-bit@sha256:2e52d17a2f34197707dcfe27626cfa45c30ced37999004638768bf533cd7f444
    Port:           <none>
    Host Port:      <none>
    State:          Running
      Started:      Fri, 01 Sep 2023 16:26:16 -0700
    Ready:          True
    Restart Count:  0
    Limits:
      memory:  200Mi
    Requests:
      cpu:     500m
      memory:  100Mi
    Environment:
      AWS_REGION:             <set to the key 'logs.region' of config map 'fluent-bit-cluster-info'>   Optional: false
      CLUSTER_NAME:           <set to the key 'cluster.name' of config map 'fluent-bit-cluster-info'>  Optional: false
      HTTP_SERVER:            <set to the key 'http.server' of config map 'fluent-bit-cluster-info'>   Optional: false
      HTTP_PORT:              <set to the key 'http.port' of config map 'fluent-bit-cluster-info'>     Optional: false
      READ_FROM_HEAD:         <set to the key 'read.head' of config map 'fluent-bit-cluster-info'>     Optional: false
      READ_FROM_TAIL:         <set to the key 'read.tail' of config map 'fluent-bit-cluster-info'>     Optional: false
      HOST_NAME:               (v1:spec.nodeName)
      HOSTNAME:               fluent-bit-7677w (v1:metadata.name)
      CI_VERSION:             k8s/1.3.16
      AWS_ACCESS_KEY_ID:      ***
      AWS_SECRET_ACCESS_KEY:  ***
    Mounts:
      /fluent-bit/etc/ from fluent-bit-config (rw)
      /run/log/journal from runlogjournal (ro)
      /var/fluent-bit/state from fluentbitstate (rw)
      /var/lib/docker/containers from varlibdockercontainers (ro)
      /var/log from varlog (ro)
      /var/log/dmesg from dmesg (ro)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-vv8w6 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  fluentbitstate:
    Type:          HostPath (bare host directory volume)
    Path:          /var/fluent-bit/state
    HostPathType:  
  varlog:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log
    HostPathType:  
  varlibdockercontainers:
    Type:          HostPath (bare host directory volume)
    Path:          /var/lib/docker/containers
    HostPathType:  
  fluent-bit-config:
    Type:      ConfigMap (a volume populated by a ConfigMap)
    Name:      fluent-bit-config
    Optional:  false
  runlogjournal:
    Type:          HostPath (bare host directory volume)
    Path:          /run/log/journal
    HostPathType:  
  dmesg:
    Type:          HostPath (bare host directory volume)
    Path:          /var/log/dmesg
    HostPathType:  
  kube-api-access-vv8w6:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
QoS Class:                   Burstable
Node-Selectors:              <none>
Tolerations:                 :NoExecute op=Exists
                             :NoSchedule op=Exists
                             node-role.kubernetes.io/master:NoSchedule op=Exists
                             node.kubernetes.io/disk-pressure:NoSchedule op=Exists
                             node.kubernetes.io/memory-pressure:NoSchedule op=Exists
                             node.kubernetes.io/network-unavailable:NoSchedule op=Exists
                             node.kubernetes.io/not-ready:NoExecute op=Exists
                             node.kubernetes.io/pid-pressure:NoSchedule op=Exists
                             node.kubernetes.io/unreachable:NoExecute op=Exists
                             node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
  Type     Reason       Age   From               Message
  ----     ------       ----  ----               -------
  Normal   Scheduled    21m   default-scheduler  Successfully assigned amazon-cloudwatch/fluent-bit-7677w to ip-11-0-26-42.us-west-2.compute.internal
  Warning  FailedMount  21m   kubelet            MountVolume.SetUp failed for volume "kube-api-access-vv8w6" : failed to sync configmap cache: timed out waiting for the condition
  Normal   Pulling      21m   kubelet            Pulling image "public.ecr.aws/aws-observability/aws-for-fluent-bit:latest"
  Normal   Pulled       21m   kubelet            Successfully pulled image "public.ecr.aws/aws-observability/aws-for-fluent-bit:latest" in 191.828239ms (191.843645ms including waiting)
  Normal   Created      21m   kubelet            Created container fluent-bit
  Normal   Started      21m   kubelet            Started container fluent-bit
 kubectl logs  fluent-bit-xw5lm -n amazon-cloudwatch
Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/09/01 23:26:15] [ info] [fluent bit] version=1.9.10, commit=a2eaf59628, pid=1
[2023/09/01 23:26:15] [ info] [storage] version=1.4.0, type=memory+filesystem, sync=normal, checksum=disabled, max_chunks_up=128
[2023/09/01 23:26:15] [ info] [storage] backlog input plugin: storage_backlog.7
[2023/09/01 23:26:15] [ info] [cmetrics] version=0.3.7
[2023/09/01 23:26:15] [ info] [input:tail:tail.0] multiline core started
[2023/09/01 23:26:15] [ info] [input:tail:tail.1] multiline core started
[2023/09/01 23:26:15] [ info] [input:systemd:systemd.2] seek_cursor=s=6cee3af5256743aca6758a1d9204250d;i=26d... OK
[2023/09/01 23:26:15] [ info] [input:tail:tail.3] multiline core started
[2023/09/01 23:26:15] [ info] [input:storage_backlog:storage_backlog.7] queue memory limit: 4.8M
[2023/09/01 23:26:15] [ info] [filter:kubernetes:kubernetes.0] https=1 host=127.0.0.1 port=10250
[2023/09/01 23:26:15] [ info] [filter:kubernetes:kubernetes.0]  token updated
[2023/09/01 23:26:15] [ info] [filter:kubernetes:kubernetes.0] local POD info OK
[2023/09/01 23:26:15] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet...
[2023/09/01 23:26:15] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD fluent-bit-xw5lm
[2023/09/01 23:26:15] [ info] [output:cloudwatch_logs:cloudwatch_logs.0] worker #0 started
[2023/09/01 23:26:15] [ info] [output:cloudwatch_logs:cloudwatch_logs.1] worker #0 started
[2023/09/01 23:26:15] [ info] [output:cloudwatch_logs:cloudwatch_logs.2] worker #0 started
[2023/09/01 23:26:15] [ info] [http_server] listen iface=0.0.0.0 tcp_port=2020
[2023/09/01 23:26:15] [ info] [sp] stream processor started
PettitWesley commented 10 months ago

[2023/09/01 23:26:15] [ info] [filter:kubernetes:kubernetes.0] testing connectivity with Kubelet... [2023/09/01 23:26:15] [ warn] [filter:kubernetes:kubernetes.0] could not get meta for POD fluent-bit-xw5lm

Looks like the connection to the kubelet is not working. Please see: https://docs.fluentbit.io/manual/pipeline/filters/kubernetes#optional-feature-using-kubelet-to-get-metadata