Open ashenwgt opened 1 year ago
I'm not sure about this; I'm testing it out myself in an EKS cluster today.
Existing guidance I can find suggests that since the pod log files are root owned, FLB must also run as root:
However, this doesn't make sense to me... I think if we give FLB the right capabilities it should be able to read the pod log files and probably even create its storage directory.
https://man7.org/linux/man-pages/man7/capabilities.7.html
I'll post here once I'm done testing.
Alrighty, it seems that adding extra capabilities does not work:
[2023/09/29 22:35:20] [error] [plugins/in_tail/tail_file.c:888 errno=13] Permission denied
[2023/09/29 22:35:20] [error] [input:tail:tail.4] cannot open /var/log/containers/aws-node-74sfs_kube-system_aws-vpc-cni-init-8e3f6a198939804f5a716d92d7b0fe96b984fe4efc98e1b4ec04d1ceab5fc04e.log
[2023/09/29 22:35:20] [error] [plugins/in_tail/tail_file.c:888 errno=13] Permission denied
[2023/09/29 22:35:20] [error] [input:tail:tail.4] cannot open /var/log/containers/kube-proxy-jsgfc_kube-system_kube-proxy-40c90418e671cc466cb20d9f380ae578c0db2819fb097fb2db5320b1ef253ef9.log
I got this even though I set:
spec:
securityContext:
fsGroup: 1000
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
containers:
- name: fluent-bit
image: public.ecr.aws/aws-observability/aws-for-fluent-bit:stable
imagePullPolicy: Always
securityContext:
runAsUser: 1000
runAsGroup: 1000
runAsNonRoot: true
capabilities:
drop:
- ALL
add:
- CAP_FOWNER
- CAP_DAC_OVERRIDE
- CAP_DAC_READ_SEARCH
- CAP_FSETID
And of course, if you use host volume mounts for the tail DB or the storage.path
, then that will fail due to permissions as well:
[2023/09/29 22:33:54] [error] [sqldb] cannot open database /var/fluent-bit/state/flb_container.db
[2023/09/29 22:33:54] [error] [input:tail:tail.0] could not open/create database
[2023/09/29 22:33:54] [error] [lib] backend failed
Those capabilities can be used in known container breakout attacks, so even if adding them worked, this likely still wouldn't satisfy the true goal of non-root, which is to lock down containers.
I'm very surprised it does not work though, I guess I don't understand those linux capabilities.
Describe the question/issue
I am trying to run the
aws-for-fluent-bit
container with a non-root user usingthe below k8 manifest.Even though I explicitly set
fsGroup
to1000
here, I noticed that the/var/fluent-bit/state
directory gets created as root inside k8 host nodes.Also, with the above settings, fluent-bit pods go to a
CrashLoopBackOff
with the below errors on logs.As of these discussions on aws/eks-charts repo (https://github.com/aws/eks-charts/issues/928) and fluent/fluent-bit repo (https://github.com/fluent/fluent-bit/issues/872), I learned that this container has to run as root.
Can you please confirm my understanding?
If that is not the case, then is there a way to run the
aws-for-fluent-bit
container as a non-root user and with non-root-owned volumes?