aws / aws-for-fluent-bit

The source of the amazon/aws-for-fluent-bit container image
Apache License 2.0
458 stars 135 forks source link

aws-for-fluent-bit pod ignores k8 security context values like runAsUser, runAsGroup, fsGroup, and runAsNonRoot #729

Open ashenwgt opened 1 year ago

ashenwgt commented 1 year ago

Describe the question/issue

I am trying to run the aws-for-fluent-bit container with a non-root user usingthe below k8 manifest.

apiVersion: apps/v1
kind: DaemonSet
metadata:
  name: fluent-bit
  .....
spec:
  selector:
    matchLabels:
      k8s-app: fluent-bit
  template:
    metadata:
      labels:
        k8s-app: fluent-bit
        version: v1
        kubernetes.io/cluster-service: "true"
    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
      containers:
      - name: fluent-bit
        image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
        imagePullPolicy: Always
        securityContext:
            runAsUser: 1000
            runAsGroup: 1000
            runAsNonRoot: true
       .....
        volumeMounts:
        - name: fluentbitstate
          mountPath: /var/fluent-bit/state
        - name: varlog
          mountPath: /var/log
          readOnly: true
        - name: varlibdockercontainers
          mountPath: /var/lib/docker/containers
          readOnly: true
        - name: fluent-bit-config
          mountPath: /fluent-bit/etc/
        - name: runlogjournal
          mountPath: /run/log/journal
          readOnly: true
        - name: dmesg
          mountPath: /var/log/dmesg
          readOnly: true
      terminationGracePeriodSeconds: 10
      hostNetwork: true
      dnsPolicy: ClusterFirstWithHostNet
      volumes:
      - name: fluentbitstate
        hostPath:
          path: /var/fluent-bit/state
      - name: varlog
        hostPath:
          path: /var/log
      - name: varlibdockercontainers
        hostPath:
          path: /var/lib/docker/containers
      - name: fluent-bit-config
        configMap:
          name: fluent-bit-config
      - name: runlogjournal
        hostPath:
          path: /run/log/journal
      - name: dmesg
        hostPath:
          path: /var/log/dmesg

Even though I explicitly set fsGroup to 1000 here, I noticed that the /var/fluent-bit/state directory gets created as root inside k8 host nodes.

$ ls -al /var/fluent-bit/
total 0
drwxr-xr-x  3 root root  19 Sep  7 06:02 .
drwxr-xr-x 20 root root 286 Sep  7 06:02 ..
drwxr-xr-x  2 root root   6 Sep  7 06:02 state

Also, with the above settings, fluent-bit pods go to a CrashLoopBackOff with the below errors on logs.

Fluent Bit v1.9.10
* Copyright (C) 2015-2022 The Fluent Bit Authors
* Fluent Bit is a CNCF sub-project under the umbrella of Fluentd
* https://fluentbit.io

[2023/09/07 07:45:28] [ info] [fluent bit] version=1.9.10, commit=1f4d09e087, pid=1
[2023/09/07 07:45:28] [error] [storage] [chunkio] cannot initialize root path /var/fluent-bit/state/flb-storage/

[2023/09/07 07:45:28] [error] [storage] error initializing storage engine
[2023/09/07 07:45:28] [error] [lib] backend failed
AWS for Fluent Bit Container Image Version 2.31.11

As of these discussions on aws/eks-charts repo (https://github.com/aws/eks-charts/issues/928) and fluent/fluent-bit repo (https://github.com/fluent/fluent-bit/issues/872), I learned that this container has to run as root.

Can you please confirm my understanding?

If that is not the case, then is there a way to run the aws-for-fluent-bit container as a non-root user and with non-root-owned volumes?

PettitWesley commented 1 year ago

I'm not sure about this; I'm testing it out myself in an EKS cluster today.

Existing guidance I can find suggests that since the pod log files are root owned, FLB must also run as root:

However, this doesn't make sense to me... I think if we give FLB the right capabilities it should be able to read the pod log files and probably even create its storage directory.

https://man7.org/linux/man-pages/man7/capabilities.7.html

I'll post here once I'm done testing.

PettitWesley commented 1 year ago

Alrighty, it seems that adding extra capabilities does not work:

[2023/09/29 22:35:20] [error] [plugins/in_tail/tail_file.c:888 errno=13] Permission denied
[2023/09/29 22:35:20] [error] [input:tail:tail.4] cannot open /var/log/containers/aws-node-74sfs_kube-system_aws-vpc-cni-init-8e3f6a198939804f5a716d92d7b0fe96b984fe4efc98e1b4ec04d1ceab5fc04e.log
[2023/09/29 22:35:20] [error] [plugins/in_tail/tail_file.c:888 errno=13] Permission denied
[2023/09/29 22:35:20] [error] [input:tail:tail.4] cannot open /var/log/containers/kube-proxy-jsgfc_kube-system_kube-proxy-40c90418e671cc466cb20d9f380ae578c0db2819fb097fb2db5320b1ef253ef9.log

I got this even though I set:

    spec:
      securityContext:
        fsGroup: 1000
        runAsUser: 1000
        runAsGroup: 1000
        runAsNonRoot: true
      containers:
      - name: fluent-bit
        image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
        imagePullPolicy: Always
        securityContext:
          runAsUser: 1000
          runAsGroup: 1000
          runAsNonRoot: true
          capabilities:
            drop:
              - ALL
            add:
              - CAP_FOWNER
              - CAP_DAC_OVERRIDE
              - CAP_DAC_READ_SEARCH
              - CAP_FSETID
PettitWesley commented 1 year ago

And of course, if you use host volume mounts for the tail DB or the storage.path, then that will fail due to permissions as well:

[2023/09/29 22:33:54] [error] [sqldb] cannot open database /var/fluent-bit/state/flb_container.db
[2023/09/29 22:33:54] [error] [input:tail:tail.0] could not open/create database
[2023/09/29 22:33:54] [error] [lib] backend failed
PettitWesley commented 1 year ago

Those capabilities can be used in known container breakout attacks, so even if adding them worked, this likely still wouldn't satisfy the true goal of non-root, which is to lock down containers.

I'm very surprised it does not work though, I guess I don't understand those linux capabilities.