Open jatinmehrotra opened 1 year ago
I am wondering liveness probe was introduced in this commit https://github.com/aws/eks-charts/pull/975. I am wondering If this is a port issue that probe requests are not Able to reach in to port 2020
( I may be wrong Since I dont know the exact internals of how fluent bit server is establishing the network connectivity for probes )
We are seeing the same issue though its somewhat sporadic - when scaling up an EKS cluster about 30-40% of fluentbit pods enter this crashloop and never seem to pass the health checks. The other pods do manage and are fine.
Describe the bug
Until now the fluent bit pods were working fine, but the moment I updated my chart from
0.1.19
to0.1.29
our Fluent Bit pods enter a CrashLoopBackoff state, due to failures in the newly introduced https://github.com/aws/eks-charts/pull/975."Pod event show the following message
Liveness probe failed: HTTP Probe failed with statuscode: 500
Steps to reproduce
Spin up an IPv4 EKS Cluster, install the
aws-for-fluent-bit
Chart in version 0.1.29. The pods will enter CrashLoopBackoff.Expected outcome Liveness probe should be passed with updated chart configuration
Environment
Chart name: aws-for-fluent-bit Chart version: 0.1.29 Kubernetes version: 1.25 Using EKS (yes/no), if so version? Yes, v1.25.12-eks-2d98532
Additional Context:
Note: the pods are runing on ec2 node.